高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于注意力机制的单发多框检测器算法

赵辉 李志伟 张天琪

赵辉, 李志伟, 张天琪. 基于注意力机制的单发多框检测器算法[J]. 电子与信息学报, 2021, 43(7): 2096-2104. doi: 10.11999/JEIT200304
引用本文: 赵辉, 李志伟, 张天琪. 基于注意力机制的单发多框检测器算法[J]. 电子与信息学报, 2021, 43(7): 2096-2104. doi: 10.11999/JEIT200304
Hui ZHAO, Zhiwei LI, Tianqi ZHANG. Attention Based Single Shot Multibox Detector[J]. Journal of Electronics & Information Technology, 2021, 43(7): 2096-2104. doi: 10.11999/JEIT200304
Citation: Hui ZHAO, Zhiwei LI, Tianqi ZHANG. Attention Based Single Shot Multibox Detector[J]. Journal of Electronics & Information Technology, 2021, 43(7): 2096-2104. doi: 10.11999/JEIT200304

基于注意力机制的单发多框检测器算法

doi: 10.11999/JEIT200304
基金项目: 国家自然科学基金(61671095)
详细信息
    作者简介:

    赵辉:女,1980年生,教授,博士生导师,研究方向为深空通信、信号理论与信息处理、图像信息处理

    李志伟:男,1996年生,硕士,研究方向为计算机视觉、深度学习、目标检测

    张天琪:男,1971年生,教授,博士生导师,研究方向为无线通信的智能信号处理、通信抗干扰和信息对抗

    通讯作者:

    赵辉 zhaohui@cqupt.edu.cn

  • 中图分类号: TN911.73; TP391.41

Attention Based Single Shot Multibox Detector

Funds: The National Natural Science Foundation of China (61671095)
  • 摘要: 单发多框检测器SSD是一种在简单、快速和准确性之间有着较好平衡的目标检测器算法。SSD网络结构中检测层单一的利用方式使得特征信息利用不充分,将导致小目标检测不够鲁棒。该文提出一种基于注意力机制的单发多框检测器算法ASSD。ASSD算法首先利用提出的双向特征融合模块进行特征信息融合以获取包含丰富细节和语义信息的特征层,然后利用提出的联合注意力单元进一步挖掘重点特征信息进而指导模型优化。最后,公共数据集上进行的一系列相关实验表明ASSD算法有效提高了传统SSD算法的检测精度,尤其适用于小目标检测。
  • 图  1  ASSD算法框架结构图

    图  2  模块网络结构图

    图  3  联合注意力单元的网络结构图与算法流程框图

    图  4  SSD和ASSD的P-R曲线对比图

    图  5  ASSD算法与SSD算法样例检测结果对比图

    表  1  Pascal VOC2007 test上的ASSD消融实验 (1 M=106)

    方法输入尺寸mAP#参数量fps
    SSD300300×30077.550.14 M60.2
    SSD512512×51279.851.98 M25.2
    SSD300+TwFFM300×30078.864.18 M47.1
    SSD300+SDPA[22]300×30078.256.38 M53.1
    SSD300+SEB[23]300×30078.454.43 M51.5
    SSD300+JAU300×30078.660.67 M48.1
    ASSD300300×30079.174.71 M39.6
    ASSD512512×51280.976.55 M20.8
    下载: 导出CSV

    表  2  各种算法在Pascal VOC2007测试集上的性能对比

    方法基础骨干网输入尺寸mAPfps
    YOLOv2[11]Darkent-19416×41676.867
    YOLOv2+[11]Darkent-19544×54478.640
    Faster RCNN[7]ResNet-101~1000×60076.42.4
    SSD300VGG-16300×30077.560.2
    SSD512VGG-16512×51279.825.2
    DSSD321[16]ResNet-101300×30078.69.5
    DSSD513[16]ResNet-101513×51381.55.5
    RSSD300[20]VGG-16300×30078.535.0
    RSSD512[20]VGG-16300×30080.816.6
    FSSD300[21]VGG-16300×30078.865.8
    FSSD512[21]VGG-16512×51280.935.7
    ASSD300VGG-16300×30079.139.6
    ASSD512VGG-16512×51281.020.8
    下载: 导出CSV

    表  3  各种算法在Pascal VOC2007测试集中20个类别的性能对比

    方法mAPaerobikebirdboatbottlebuscarcatchaircow
    SSD30077.579.583.976.069.650.587.085.788.160.381.5
    SSD51279.885.885.679.574.159.986.88889.163.886.3
    ASSD30079.185.484.178.771.854.086.285.389.560.487.4
    ASSD51281.086.885.284.175.260.588.388.489.363.587.6
    方法mAPtabledoghorsembikepersonplantsheepsofatraintv
    SSD30077.577.086.187.584.079.452.377.979.587.676.8
    SSD51279.875.887.487.583.283.556.381.677.78777.4
    ASSD30079.177.187.486.884.879.557.881.580.187.476.9
    ASSD51281.076.688.286.785.782.859.283.680.587.580.8
    下载: 导出CSV
  • [1] 赵凤, 孙文静, 刘汉强, 等. 基于近邻搜索花授粉优化的直觉模糊聚类图像分割[J]. 电子与信息学报, 2020, 42(4): 1005–1012. doi: 10.11999/JEIT190428

    ZHAO Feng, SUN Wenjing, LIU Hanqiang, et al. Intuitionistic fuzzy clustering image segmentation based on flower pollination optimization with nearest neighbor searching[J]. Journal of Electronics &Information Technology, 2020, 42(4): 1005–1012. doi: 10.11999/JEIT190428
    [2] 孙彦景, 石韫开, 云霄, 等. 基于多层卷积特征的自适应决策融合目标跟踪算法[J]. 电子与信息学报, 2019, 41(10): 2464–2470. doi: 10.11999/JEIT180971

    SUN Yanjing, SHI Yunkai, YUN Xiao, et al. Adaptive strategy fusion target tracking based on multi-layer convolutional features[J]. Journal of Electronics &Information Technology, 2019, 41(10): 2464–2470. doi: 10.11999/JEIT180971
    [3] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 580–587. doi: 10.1109/CVPR.2014.81.
    [4] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904–1916. doi: 10.1109/TPAMI.2015.2389824
    [5] GIRSHICK R. Fast R-CNN[C]. Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1440–1448. doi: 10.1109/ICCV.2015.169.
    [6] SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large scale image recognition[C]. Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015.
    [7] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031
    [8] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 779–788. doi: 10.1109/CVPR.2016.91.
    [9] LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: Single shot MultiBox detector[C]. Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 21–37. doi: 10.1007/978-3-319-46448-0_2.
    [10] SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1–9. doi: 10.1109/CVPR.2015.7298594.
    [11] REDMON J and FARHADI A. YOLO9000: Better, faster, stronger[C]. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 7263–7271. doi: 10.1109/CVPR.2017.690.
    [12] ZEILER M D and FERGUS R. Visualizing and understanding convolutional networks[C]. Proceedings of the 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 818–833. doi: 10.1007/978-3-319-10590-1_53.
    [13] CHEN Chenyi, LIU Mingyu, TUZEL O, et al. R-CNN for small object detection[C]. Proceedings of the 13th Asian Conference on Computer Vision, Taipei, China, 2016: 214–230. doi: 10.1007/978-3-319-54193-8_14.
    [14] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
    [15] HUANG Gao, LIU Zhuang, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 4700–4708. doi: 10.1109/CVPR.2017.243.
    [16] FU Chengyang, LIU Wei, RANGA A, et al. DSSD: Deconvolutional single shot detector[J]. arXiv: 1701.06659, 2017.
    [17] SHEN Zhiqiang, LIU Zhuang, LI Jianguo, et al. Dsod: Learning deeply supervised object detectors from scratch[C]. Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 1919–1927. doi: 10.1109/ICCV.2017.212.
    [18] ZEILER M D and FERGUS R. Visualizing and understanding convolutional networks[C]. Proceedings of the 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 818–833.
    [19] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 2117–2125. doi: 10.1109/CVPR.2017.106.
    [20] JEONG J, PARK H, and KWAK N. Enhancement of SSD by concatenating feature maps for object detection[C]. Proceedings of 2017 British Machine Vision Conference, London, UK, 2017.
    [21] LI Zuoxin and ZHOU Fuqiang. FSSD: Feature fusion single shot Multibox detector[J]. arXiv: 1712.00960, 2017.
    [22] MNIH V, HEESS N, GRAVES A, et al. Recurrent models of visual attention[C]. Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 2204–2212.
    [23] BAHDANAU D, CHO K, BENGIO Y, et al. Neural machine translation by jointly learning to align and translate[C]. Proceedings of the 3rd International Conference on Learning Representations, San Diego, USA, 2015.
    [24] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 5998–6008.
    [25] HU Jie, SHEN Li, and SUN Gang. Squeeze-and-excitation networks[C]. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7132–7141. doi: 10.1109/CVPR.2018.00745.
    [26] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338. doi: 10.1007/s11263-009-0275-4
  • 加载中
图(5) / 表(3)
计量
  • 文章访问数:  1120
  • HTML全文浏览量:  370
  • PDF下载量:  142
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-04-24
  • 修回日期:  2021-02-15
  • 网络出版日期:  2021-03-31
  • 刊出日期:  2021-07-10

目录

    /

    返回文章
    返回