高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于双层解耦策略和注意力机制的遮挡目标分割

吕岳 周浙泉 吕淑静

吕岳, 周浙泉, 吕淑静. 基于双层解耦策略和注意力机制的遮挡目标分割[J]. 电子与信息学报, 2023, 45(1): 335-343. doi: 10.11999/JEIT211288
引用本文: 吕岳, 周浙泉, 吕淑静. 基于双层解耦策略和注意力机制的遮挡目标分割[J]. 电子与信息学报, 2023, 45(1): 335-343. doi: 10.11999/JEIT211288
LÜ Yue, ZHOU Zhequan, LÜ Shujing. Occluded Object Segmentation Based on Bilayer Decoupling Strategy and Attention Mechanism[J]. Journal of Electronics & Information Technology, 2023, 45(1): 335-343. doi: 10.11999/JEIT211288
Citation: LÜ Yue, ZHOU Zhequan, LÜ Shujing. Occluded Object Segmentation Based on Bilayer Decoupling Strategy and Attention Mechanism[J]. Journal of Electronics & Information Technology, 2023, 45(1): 335-343. doi: 10.11999/JEIT211288

基于双层解耦策略和注意力机制的遮挡目标分割

doi: 10.11999/JEIT211288
详细信息
    作者简介:

    吕岳:男,博士,教授,研究方向为模式识别、图像处理、智能物联网、机器智能、机器视觉系统

    周浙泉:男,硕士生,研究方向为模式识别、图像处理

    吕淑静:女,博士,专任研究员,研究方向为模式识别、图像处理、机器学习

    通讯作者:

    吕岳 ylu@cs.ecnu.edu.cn

  • 中图分类号: TN911.73; TP391

Occluded Object Segmentation Based on Bilayer Decoupling Strategy and Attention Mechanism

  • 摘要: 遮挡目标分割是实例分割中的一个难点,但在多个应用领域有很强的实用价值,例如物流传输线上堆叠快递包裹的分割。针对快递包裹目标遮挡导致难以分割的问题,该文提出一种基于双层解耦策略和注意力机制的遮挡目标分割方法。该方法首先利用带有特征金字塔(FPN)的主干网络提取图像特征;然后,利用双层解耦检测头自动预测实例的重心是否被遮挡并使用不同的分支对两类不同遮挡类型的实例进行检测;接下来,利用注意力改进模块得到无遮挡实例的预测掩模并将这些掩模合成为一个注意力权重图;最后,注意力改进模块利用该注意力权重图帮助有遮挡实例得到分割结果。该研究采集了一个遮挡快递包裹实例分割数据集,并在该数据集上进行实验。实验结果表明,该方法的平均精度(AP)、召回率(Recall)和漏检率(MR–2)指标分别达到了95.66%, 97.17%和11.78%,较其他方法具有更优的分割性能。
  • 图  1  CondInst的网络结构

    图  2  CondInst的失败案例

    图  3  基于双层解耦策略和注意力机制的CondInst网络结构

    图  4  遮挡快递包裹数据集图像

    图  5  本文方法和基准方法的可视化结果对比

    图  6  注意力改进模块的可视化对比

    表  1  不同方法与不同NMS策略的分割结果(%)

    NMS策略方法APRecallMR–2
    Box-NMSMask RCNN92.2193.7516.79
    CondInst92.1793.2715.96
    CondInst+双层92.8394.3818.96
    CondInst+双层+注意力94.9496.0015.47
    Matrix-NMSMask RCNN91.1291.5416.94
    CondInst91.4992.5316.01
    CondInst+双层95.1496.8512.75
    CondInst+双层+注意力95.6697.1711.78
    下载: 导出CSV

    表  2  与遮挡目标检测方法比较的结果(%)

    方法APRecallMR–2
    CondInst+Soft-NMS94.8897.6720.55
    CrowdDet94.2296.0017.85
    本文方法(基于掩模)94.7296.7414.82
    本文方法(基于包围框头)95.3397.7214.38
    下载: 导出CSV

    表  3  针对有遮挡实例和无遮挡实例的分割结果(%)

    方法无遮挡实例有遮挡实例
    APARAPAR
    CondInst94.6294.7069.0377.93
    CondInst+双层96.5295.5983.5791.80
    CondInst+双层+注意力96.6295.7084.4792.52
    下载: 导出CSV

    表  4  共享卷积层的数量对模型性能的影响

    共享卷积层的数量AP(%)Recall(%)MR–2(%)
    495.1297.1117.45
    395.2296.5312.87
    295.6697.1711.78
    195.4296.8710.87
    095.3296.7811.23
    下载: 导出CSV

    表  5  不同置信度阈值$ \sigma $对模型性能的影响

    置信度阈值$ \sigma $AP(%)Recall(%)MR–2(%)
    0.5095.2796.4512.41
    0.6095.3296.4812.10
    0.6595.6697.1711.78
    0.7095.3196.3212.07
    0.9094.2695.6814.79
    1.0094.2195.6014.79
    下载: 导出CSV
  • [1] HE Kaiming, GKIOXARI G, DOLLÁR P, et al. Mask R-CNN[C]. The 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2980–2988.
    [2] PENG Sida, JIANG Wen, PI Huaijin, et al. Deep snake for real-time instance segmentation[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 8530–8539.
    [3] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[C]. The 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 2015: 91–99.
    [4] TIAN Zhi, SHEN Chunhua, and CHEN Hao. Conditional convolutions for instance segmentation[C]. The 16th European Conference on Computer Vision, Glasgow, UK, 2020: 282–298.
    [5] XIE Enze, SUN Peize, SONG Xiaoge, et al. PolarMask: Single shot instance segmentation with polar representation[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 12190–12199.
    [6] WANG Xinlong, KONG Tao, SHEN Chunhua, et al. SOLO: Segmenting objects by locations[C]. The 16th European Conference on Computer Vision, Glasgow, UK, 2020: 649–665.
    [7] WANG Xinlong, ZHANG Rufeng, KONG Tao, et al. SOLOv2: Dynamic and fast instance segmentation[C/OL]. Advances in Neural Information Processing Systems, 2020: 17721–17732.
    [8] ZHANG Rufeng, TIAN Zhi, SHEN Chunhua, et al. Mask encoding for single shot instance segmentation[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10223–10232.
    [9] ZHANG Shifeng, WEN Longyin, BIAN Xiao, et al. Occlusion-aware R-CNN: Detecting pedestrians in a crowd[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 657–674.
    [10] WANG Xinlong, XIAO Tete, JIANG Yuning, et al. Repulsion loss: Detecting pedestrians in a crowd[C]. The 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7774–7783.
    [11] BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS—improving object detection with one line of code[C]. The 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 5562–5570.
    [12] HE Yihui, ZHU Chenchen, WANG Jianren, et al. Bounding box regression with uncertainty for accurate object detection[C]. The 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 2883–2892.
    [13] HOSANG J, BENENSON R, and SCHIELE B. Learning non-maximum suppression[C]. The 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6469–6477.
    [14] QI Lu, LIU Shu, SHI Jianping, et al. Sequential context encoding for duplicate removal[C]. The 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 2018: 2053–2062.
    [15] HOSANG J, BENENSON R, and SCHIELE B. A convnet for non-maximum suppression[C]. The 38th German Conference on Pattern Recognition, Hannover, Germany, 2016: 192–204.
    [16] LIU Songtao, HUANG Di, and WANG Yunhong. Adaptive NMS: Refining pedestrian detection in a crowd[C]. The 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 6452–6461.
    [17] STEWART R, ANDRILUKA M, and NG A Y. End-to-end people detection in crowded scenes[C]. The 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 2325–2333.
    [18] RUKHOVICH D, SOFIIUK K, GALEEV D, et al. IterDet: Iterative scheme for object detection in crowded environments[C]. Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), Padua, Italy, 2021: 344–354.
    [19] CHU Xuangeng, ZHENG Anlin, ZHANG Xiangyu, et al. Detection in crowded scenes: One proposal, multiple predictions[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 12211–12220.
    [20] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. The 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 936–944.
    [21] TIAN Zhi, SHEN Chunhua, CHEN Hao, et al. FCOS: Fully convolutional one-stage object detection[C]. The 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 9626–9635.
    [22] DOLLAR P, WOJEK C, SCHIELE B, et al. Pedestrian detection: An evaluation of the state of the art[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(4): 743–761. doi: 10.1109/TPAMI.2011.155
  • 加载中
图(6) / 表(5)
计量
  • 文章访问数:  761
  • HTML全文浏览量:  304
  • PDF下载量:  155
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-11-18
  • 修回日期:  2022-02-24
  • 录用日期:  2022-03-01
  • 网络出版日期:  2022-03-08
  • 刊出日期:  2023-01-17

目录

    /

    返回文章
    返回