高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于注意力机制的多尺度全场景监控目标检测方法

张德祥 王俊 袁培成

张德祥, 王俊, 袁培成. 基于注意力机制的多尺度全场景监控目标检测方法[J]. 电子与信息学报, 2022, 44(9): 3249-3257. doi: 10.11999/JEIT210664
引用本文: 张德祥, 王俊, 袁培成. 基于注意力机制的多尺度全场景监控目标检测方法[J]. 电子与信息学报, 2022, 44(9): 3249-3257. doi: 10.11999/JEIT210664
ZHANG Dexiang, WANG Jun, YUAN Peicheng. Object Detection Method for Multi-scale Full-scene Surveillance Based on Attention Mechanism[J]. Journal of Electronics & Information Technology, 2022, 44(9): 3249-3257. doi: 10.11999/JEIT210664
Citation: ZHANG Dexiang, WANG Jun, YUAN Peicheng. Object Detection Method for Multi-scale Full-scene Surveillance Based on Attention Mechanism[J]. Journal of Electronics & Information Technology, 2022, 44(9): 3249-3257. doi: 10.11999/JEIT210664

基于注意力机制的多尺度全场景监控目标检测方法

doi: 10.11999/JEIT210664
基金项目: 国家重点研发计划(2018YFB0504604)
详细信息
    作者简介:

    张德祥:男,教授,研究方向为遥感图像处理、深度学习

    王俊:男,硕士生,研究方向为视频与图像处理、机器学习

    袁培成:男,硕士生,研究方向为视频与图像处理、机器学习

    通讯作者:

    张德祥 dqxyzdx@126.com

  • 中图分类号: TN911.73; TP391.4

Object Detection Method for Multi-scale Full-scene Surveillance Based on Attention Mechanism

Funds: The Foundation National Key Research and Development Program (2018YFB0504604)
  • 摘要: 针对复杂城市监控场景中由于目标尺寸变化大、目标遮挡、天气影响等原因导致目标特征不明显的问题,该文提出一种基于注意力机制的多尺度全场景监控目标检测方法。该文设计了一种基于Yolov5s模型的多尺度检测网络结构,以提高网络对目标尺寸变化的适应性。同时,构建了基于注意力机制的特征提取模块,通过网络学习获得特征的通道级别权重,增强了目标特征,抑制了背景特征,提高了特征的网络提取能力。通过K-means聚类算法计算全场景监控数据集的初始锚框大小,加速模型收敛同时提升检测精度。在COCO数据集上,与基本网络相比,平均精度均值(mAP)提高了3.7%,mAP50提升了4.7%,模型推理时间仅为3.8 ms。在整个场景监控数据集中,mAP50达到89.6%,处理监控视频时为154 fps,满足监控现场的实时检测要求。
  • 图  1  Yolov5s网络结构

    图  2  多尺度检测网络结构

    图  3  SE模块结构图

    图  4  SE-CSPNet

    图  5  数据集示例图片

    图  6  检测结果对比

    表  1  COCO数据集上的消融实验结果

    方法mAP50mAP推理时间(ms)
    Yolov5s55.436.73.0
    Yolov5s+Attention155.735.43.3
    Yolov5s +Attention256.837.43.2
    Yolov5s +Attention353.133.02.9
    Yolov5s +Attention454.233.52.8
    Yolov5s +Multi-scale59.039.33.5
    MODN-BAM60.140.43.8
    下载: 导出CSV

    表  2  Open Images v6数据集上的消融实验结果

    方法mAP50mAP
    Yolov5s71.249.8
    Yolov5s+Attention272.151.3
    Yolov5s+ Multi-scale74.655.5
    MODN-BAM75.756.3
    下载: 导出CSV

    表  3  COCO数据集上与其他算法的对比结果

    方法SizemAPmAP50mAP75mAPsmAPmmAPlfps
    RetinaNet-ResNet101800×80037.857.540.820.241.149.211
    YOLOF800×*37.756.940.619.142.553.232
    YOLOF-ResNet101800×*39.859.442.920.544.554.921
    RDSNet800×80038.158.540.821.241.548.210.9
    Yolov3608×60833.057.934.418.335.441.920
    Yolov3-SPP608×60836.260.638.220.637.446.173
    NAS-FPN640×64039.924
    EfficientDet-D1640×64039.658.642.317.944.356.050
    Yolov5s640×64036.755.439.021.141.945.5204
    MODN-BAM640×64040.460.143.322.545.054.9175
    下载: 导出CSV

    表  4  全场景监控数据集上的消融实验结果

    方法frame sizemAP50fps
    Yolov5s1920×108085.7182
    Yolov5s+Attention21920×108087.6176
    Yolov5s+ Multi-scale1920×108088.4163
    MODN-BAM1920×108089.6154
    下载: 导出CSV
  • [1] 陈勇, 刘曦, 刘焕淋. 基于特征通道和空间联合注意机制的遮挡行人检测方法[J]. 电子与信息学报, 2020, 42(6): 1486–1493. doi: 10.11999/JEIT190606

    CHEN Yong, LIU Xi, and LIU Huanlin. Occluded pedestrian detection based on joint attention mechanism of channel-wise and spatial information[J]. Journal of Electronics &Information Technology, 2020, 42(6): 1486–1493. doi: 10.11999/JEIT190606
    [2] DALAL N and TRIGGS B. Histograms of oriented gradients for human detection[C]. 2005 IEEE Conference on Computer Vision and Pattern Recognition, San Diego, USA, 2005: 886–893.
    [3] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91–110. doi: 10.1023/B:VISI.0000029664.99615.94
    [4] 董小伟, 韩悦, 张正, 等. 基于多尺度加权特征融合网络的地铁行人目标检测算法[J]. 电子与信息学报, 2021, 43(7): 2113–2120. doi: 10.11999/JEIT200450

    DONG Xiaowei, HAN Yue, ZHANG Zheng, et al. Metro pedestrian detection algorithm based on multi-scale weighted feature fusion network[J]. Journal of Electronics &Information Technology, 2021, 43(7): 2113–2120. doi: 10.11999/JEIT200450
    [5] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. 2014 IEEE conference on computer vision and pattern recognition, Columbus, USA, 2014: 580–587.
    [6] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904–1916. doi: 10.1109/TPAMI.2015.2389824
    [7] DAI Jifeng, LI Yi, HE Kaiming, et al. R-FCN: Object detection via region-based fully convolutional networks[C]. The 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 379–387.
    [8] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 779–788.
    [9] LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: Single shot MultiBox detector[C]. The 14th European Conference on Computer Vision. Amsterdam, The Netherlands, 2016: 21–37.
    [10] 刘革, 郑叶龙, 赵美蓉. 基于RetinaNet改进的车辆信息检测[J]. 计算机应用, 2020, 40(3): 854–858. doi: 10.11772/j.issn.1001-9081.2019071262

    LIU Ge, ZHENG Yelong, and ZHAO Meirong. Vehicle information detection based on improved RetinaNet[J]. Journal of Computer Applications, 2020, 40(3): 854–858. doi: 10.11772/j.issn.1001-9081.2019071262
    [11] DUAN Kaiwen, BAI Song, XIE Lingxi, et al. CenterNet: Keypoint triplets for object detection[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 6568–6577.
    [12] ZHOU Xingyi, ZHUO Jiacheng, and KRÄHENBUHL P. Bottom-up object detection by grouping extreme and center points[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 850–859.
    [13] WANG C Y, LIAO H Y M, WU Y H, et al. CSPNet: A new backbone that can enhance learning capability of CNN[C]. Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, USA, 2020: 1571–1580.
    [14] WANG Wenhai, XIE Enze, SONG Xiaoge, et al. Efficient and accurate arbitrary-shaped text detection with pixel aggregation network[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 8439–8448.
    [15] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 936–944.
    [16] BOCHKOVSKIY A, WANG C Y, and LIAO H Y M. YOLOv4: Optimal speed and accuracy of object detection[J]. arXiv: 2004.10934, 2020.
    [17] REDMON J and FARHADI A. YOLOv3: An incremental improvement[J]. arXiv: 1804.02767, 2018.
    [18] HU Jie, SHEN Li, and SUN Gang. Squeeze-and-excitation networks[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 7132–7141.
    [19] 陈鸿坤, 罗会兰. 多尺度语义信息融合的目标检测[J]. 电子与信息学报, 2021, 43(7): 2087–2095. doi: 10.11999/JEIT200147

    CHEN Hongkun and LUO Huilan. Multi-scale semantic information fusion for object detection[J]. Journal of Electronics &Information Technology, 2021, 43(7): 2087–2095. doi: 10.11999/JEIT200147
    [20] ROBBINS H and MONRO S. A stochastic approximation method[J]. The Annals of Mathematical Statistics, 1951, 22(3): 400–407. doi: 10.1214/aoms/1177729586
    [21] ZHENG Zhaohui, WANG Ping, LIU Wei, et al. Distance-IoU loss: Faster and better learning for bounding box regression[C]. The 34th AAAI Conference on Artificial Intelligence, New York, USA, 2020: 12993–13000.
    [22] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1026–1034.
    [23] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2999–3007.
    [24] CHEN Qiang, WANG Yingming, YANG Tong, et al. You only look one-level feature[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 13034–13043.
    [25] WANG Shaoru, GONG Yongchao, XING Junliang, et al. RDSNet: A new deep architecture for reciprocal object detection and instance segmentation[C]. The 34th AAAI Conference on Artificial Intelligence, New York, USA, 2020: 12208–12215.
    [26] GHIASI G, LIN T Y, and LE Q V. NAS-FPN: Learning scalable feature pyramid architecture for object detection[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 7029–7038.
    [27] TAN Mingxing, PANG Ruoming, and LE Q V. EfficientDet: Scalable and efficient object detection[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10778–10787.
  • 加载中
图(6) / 表(4)
计量
  • 文章访问数:  947
  • HTML全文浏览量:  459
  • PDF下载量:  208
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-07-02
  • 修回日期:  2021-12-25
  • 录用日期:  2022-01-12
  • 网络出版日期:  2022-02-03
  • 刊出日期:  2022-09-19

目录

    /

    返回文章
    返回