高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于新型多尺度注意力机制的密集人群计数算法

万洪林 王晓敏 彭振伟 白智全 杨星海 孙建德

万洪林, 王晓敏, 彭振伟, 白智全, 杨星海, 孙建德. 基于新型多尺度注意力机制的密集人群计数算法[J]. 电子与信息学报, 2022, 44(3): 1129-1136. doi: 10.11999/JEIT210163
引用本文: 万洪林, 王晓敏, 彭振伟, 白智全, 杨星海, 孙建德. 基于新型多尺度注意力机制的密集人群计数算法[J]. 电子与信息学报, 2022, 44(3): 1129-1136. doi: 10.11999/JEIT210163
WAN Honglin, WANG Xiaomin, PENG Zhenwei, BAI Zhiquan, YANG Xinghai, SUN Jiande. Dense Crowd Counting Algorithm Based on New Multi-scale Attention Mechanism[J]. Journal of Electronics & Information Technology, 2022, 44(3): 1129-1136. doi: 10.11999/JEIT210163
Citation: WAN Honglin, WANG Xiaomin, PENG Zhenwei, BAI Zhiquan, YANG Xinghai, SUN Jiande. Dense Crowd Counting Algorithm Based on New Multi-scale Attention Mechanism[J]. Journal of Electronics & Information Technology, 2022, 44(3): 1129-1136. doi: 10.11999/JEIT210163

基于新型多尺度注意力机制的密集人群计数算法

doi: 10.11999/JEIT210163
基金项目: 国家自然科学基金(61971271),山东省重点研发计划(2018GGX106008)
详细信息
    作者简介:

    万洪林:男,1979年生,副教授,博士,主要研究方向为计算机视觉、人工智能

    王晓敏:女,1998年生,硕士生,主要研究方向为图像处理、人群计数

    彭振伟:男,1995年生,硕士,主要研究方向为图像处理、人群计数

    白智全:男,1978年生,教授,博士生导师,主要研究方向为协作通信技术、无线光通信技术

    孙建德:男,1978年生,教授、博士生导师,主要研究方向为多媒体信息处理、分析、理解及其应用

    通讯作者:

    万洪林 visage1979@sdu.edu.cn

  • 中图分类号: TP391

Dense Crowd Counting Algorithm Based on New Multi-scale Attention Mechanism

Funds: The National Natural Science Foundation of China (61971271), The Key Research and Development of Shandong Province (2018GGX106008)
  • 摘要: 密集人群计数是计算机视觉领域的一个经典问题,仍然受制于尺度不均匀、噪声和遮挡等因素的影响。该文提出一种基于新型多尺度注意力机制的密集人群计数方法。深度网络包括主干网络、特征提取网络和特征融合网络。其中,特征提取网络包括特征支路和注意力支路,采用由并行卷积核函数组成的新型多尺度模块,能够更好地获取不同尺度下的人群特征,以适应密集人群分布的尺度不均匀特性;特征融合网络利用注意力融合模块对特征提取网络的输出特征进行增强,实现了注意力特征与图像特征的有效融合,提高了计数精度。在ShanghaiTech, UCF_CC_50, Mall和UCSD等公开数据集的实验表明,提出的方法在MAE和MSE两项指标上均优于现有方法。
  • 图  1  本文提出的网络结构

    图  2  基础特征提取模块,在本文亦被采用为基础注意力模块

    图  3  传统Inception结构

    图  4  改进Inception结构

    图  5  新型多尺度模块

    图  6  注意力融合模块

    图  7  密度估计图、ground truth以及原始图像

    表  1  ShanghaiTech数据集实验结果

    方法Part APart B
    MAEMSEMAEMSE
    MCNN[4]110.2173.226.441.3
    EDMNet[14]76.5100.215.426.3
    MSFNet[15]63.497.29.614.3
    Switching-CNN[9]90.4135.021.633.4
    CSRNet[8]68.2115.010.616.0
    SCAR[21]66.3114.19.515.2
    MRA-CNN[22]74.2112.511.921.3
    ACSPNet[23]85.2137.115.423.1
    ACM-CNN[16]72.2103.517.522.7
    SFANet[24]59.899.326.030.5
    FPNet[33]108.6126.326.030.5
    本文方法57.191.96.879.8
    下载: 导出CSV

    表  2  UCF_CC_50实验结果

    方法MAEMSE
    MCNN[13]377.6509.1
    MSFNet[15]257.2380.8
    Switching-CNN[9]318.1439.2
    CSRNet[8]266.1397.5
    ic-CNN[25]260.9365.5
    SCAR[21]259.0374.0
    MRA-CNN[22]240.8352.6
    ACSPNet[16]275.2383.7
    ACM-CNN[16]291.6337.0
    SDA-MCNN[26]306.6313.2
    SFANet[24]219.6316.2
    FPNet[33]463.0501.6
    本文方法175.2233.6
    下载: 导出CSV

    表  3  Mall实验结果

    方法MAEMSE
    EDMNet[14]1.805.36
    R-FCN[27]6.025.46
    Faster R-FCN[28]5.916.60
    BidirectionalConvLSTM[29]2.107.6
    DigCrowd[30]3.2116.4
    ACM-CNN[16]2.33.1
    本文方法1.572.03
    下载: 导出CSV

    表  4  UCSD实验结果

    方法MAEMSE
    MCNN[13]1.071.35
    Switching-CNN[9]1.622.10
    BidirectionalConvLSTM[29]1.131.43
    ACSCP[31]1.041.35
    CSRNet[8]1.161.47
    SaNet[32]1.021.29
    ACSPNet[23]
    ACM-CNN[16]
    SFANet[24]
    FPNet[33]
    本文方法
    1.02
    1.01
    0.82
    1.67
    0.97
    1.28
    1.29
    1.07
    3.91
    1.27
    下载: 导出CSV

    表  5  消融实验结果

    方法MAEMSE
    Backbone + D +M
    Backbone+D+M+C
    58.6
    57.8
    96.6
    92.7
    Backbone + ND +NM+C57.191.9
    下载: 导出CSV
  • [1] ARTETA C, LEMPITSKY V, and ZISSERMAN A. Counting in the wild[C]. Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 483–498.
    [2] ARTETA C, LEMPITSKY V, NOBLE J A, et al. Interactive object counting[C]. Proceedings of the 13th European Conference on Computer Vision, Zurich, Switzerland, 2014: 504–518.
    [3] SHANG Chong, AI Haizhou, and BAI Bo. End-to-end crowd counting via joint learning local and global count[C]. Proceedings of 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, USA, 2016: 1215–1219.
    [4] ZHANG Yingying, ZHOU Desen, CHEN Siqin, et al. Single-image crowd counting via multi-column convolutional neural network[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 589–597.
    [5] OÑORO-RUBIO D and LÓPEZ-SASTRE R J. Towards perspective-free object counting with deep learning[C]. Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 2016: 615–629.
    [6] MARSDEN M, MCGUINNESS K, LITTLE S, et al. ResnetCrowd: A residual deep learning architecture for crowd counting, violent behaviour detection and crowd density level classification[C]. Proceedings of the 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, Lecce, Italy, 2017: 123–126.
    [7] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.
    [8] LI Yuhong, ZHANG Xiaofan, and CHEN Deming. CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes[C]. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1091–1100.
    [9] SAM D B, SURYA S, and BABU R V. Switching convolutional neural network for crowd counting[C]. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 4031–4039.
    [10] WANG Qi, GAO Junyu, LIN Wei, et al. Learning from synthetic data for crowd counting in the wild[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 8190–8199.
    [11] LI Zhengqi, DEKEL T, COLE F, et al. Learning the depths of moving people by watching frozen people[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 4516–4525.
    [12] WANG Shunzhou, LU Yao, ZHOU Tianfei, et al. SCLNet: Spatial context learning network for congested crowd counting[J]. Neurocomputing, 2020, 404: 227–239. doi: 10.1016/j.neucom.2020.04.139
    [13] CHEN Xinya, BIN Yanrui, GAO Changxin, et al. Relevant region prediction for crowd counting[J]. Neurocomputing, 2020, 407: 399–408. doi: 10.1016/j.neucom.2020.04.117
    [14] 孟月波, 纪拓, 刘光辉, 等. 编码-解码多尺度卷积神经网络人群计数方法[J]. 西安交通大学学报, 2020, 54(5): 149–157.

    MENG Yuebo, JI Tuo, LIU Guanghui, et al. Encoding-decoding multi-scale convolutional neural network for crowd counting[J]. Journal of Xi'an Jiaotong University, 2020, 54(5): 149–157.
    [15] 左静, 巴玉林. 基于多尺度融合的深度人群计数算法[J]. 激光与光电子学进展, 2020, 57(24): 307–315.

    ZUO Jing and BA Yulin. Population-depth counting algorithm based on multiscale fusion[J]. Laser &Optoelectronics Progress, 2020, 57(24): 307–315.
    [16] ZOU Zhikang, CHENG Yu, QU Xiaoye, et al. Attend to count: Crowd counting with adaptive capacity multi-scale CNNs[J]. Neurocomputing, 2019, 367: 75–83. doi: 10.1016/j.neucom.2019.08.009
    [17] SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1–9.
    [18] IDREES H, SALEEMI I, SEIBERT C, et al. Multi-source multi-scale counting in extremely dense crowd images[C]. Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, USA, 2013: 2547–2554.
    [19] CHEN Ke, LOY C C, GONG Shaogang, et al. Feature mining for localised crowd counting[C]. Proceedings of the British Machine Vision Conference, Surrey, UK, 2012: 3–5.
    [20] CHAN A B, LIANG Z S J, and VASCONCELOS N. Privacy preserving crowd monitoring: Counting people without people models or tracking[C]. Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, USA, 2008: 1–7.
    [21] GAO Junyu, WANG Qi, and YUAN Yuan. SCAR: Spatial-/channel-wise attention regression networks for crowd counting[J]. Neurocomputing, 2019, 363: 1–8. doi: 10.1016/j.neucom.2019.08.018
    [22] ZHANG Youmei, ZHOU Chunluan, CHANG Faliang, et al. Multi-resolution attention convolutional neural network for crowd counting[J]. Neurocomputing, 2018, 329: 144–152.
    [23] MA Junjie, DAI Yaping, and TAN Y P. Atrous convolutions spatial pyramid network for crowd counting and density estimation[J]. Neurocomputing, 2019, 350: 91–101. doi: 10.1016/j.neucom.2019.03.065
    [24] ZHU Liang, ZHAO Zhijian, LU Chao, et al. Dual path multi-scale fusion networks with attention for crowd counting[J]. arXiv: 1902.01115, 2019.
    [25] RANJAN V, LE H, and HOAI M. Iterative crowd counting[C]. Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 2018: 278–293.
    [26] YANG Biao, ZHAN Weiqin, WANG Nan, et al. Counting crowds using a scale-distribution-aware network and adaptive human-shaped kernel[J]. Neurocomputing, 2020, 390: 207–216. doi: 10.1016/j.neucom.2019.02.071
    [27] DAI Jifeng, LI Yi, HE Kaiming, et al. R-FCN: Object detection via region-based fully convolutional networks[C]. Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 379–387.
    [28] REN Shaoqiang, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031
    [29] XIONG Feng, SHI Xingjian, and YEUNG D Y. Spatiotemporal modeling for crowd counting in videos[C]. Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 5161–5169.
    [30] XU Mingliang, GE Zhaoyang, JIANG Xiaoheng, et al. Depth Information Guided Crowd Counting for complex crowd scenes[J]. Pattern Recognition Letters, 2019, 125: 563–569. doi: 10.1016/j.patrec.2019.02.026
    [31] SHEN Zan, XU Yi, NI Bingbing, et al. Crowd counting via adversarial cross-scale consistency pursuit[C]. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 5245–5254.
    [32] CAO Xinkun, WANG Zhipeng, ZHAO Yanyun, et al. Scale aggregation network for accurate and efficient crowd counting[C]. Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 2018: 757–773.
    [33] 邓远志, 胡钢. 基于特征金字塔的人群密度估计方法[J]. 测控技术, 2020, 39(6): 108–114.

    DENG Yuanzhi and HU Gang. Crowd density evaluation method based on feature pyramid[J]. Measurement &Control Technology, 2020, 39(6): 108–114.
  • 加载中
图(7) / 表(5)
计量
  • 文章访问数:  2477
  • HTML全文浏览量:  656
  • PDF下载量:  200
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-02-25
  • 修回日期:  2021-10-23
  • 录用日期:  2021-11-05
  • 网络出版日期:  2021-11-11
  • 刊出日期:  2022-03-28

目录

    /

    返回文章
    返回