高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

MSIANet:多尺度交互注意力人群计数网络

张世辉 赵维勃 王磊 王威 李群鹏

张世辉, 赵维勃, 王磊, 王威, 李群鹏. MSIANet:多尺度交互注意力人群计数网络[J]. 电子与信息学报, 2023, 45(6): 2236-2245. doi: 10.11999/JEIT220644
引用本文: 张世辉, 赵维勃, 王磊, 王威, 李群鹏. MSIANet:多尺度交互注意力人群计数网络[J]. 电子与信息学报, 2023, 45(6): 2236-2245. doi: 10.11999/JEIT220644
ZHANG Shihui, ZHAO Weibo, WANG Lei, WANG Wei, LI Qunpeng. MSIANet: Multi-scale Interactive Attention Crowd Counting Network[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2236-2245. doi: 10.11999/JEIT220644
Citation: ZHANG Shihui, ZHAO Weibo, WANG Lei, WANG Wei, LI Qunpeng. MSIANet: Multi-scale Interactive Attention Crowd Counting Network[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2236-2245. doi: 10.11999/JEIT220644

MSIANet:多尺度交互注意力人群计数网络

doi: 10.11999/JEIT220644
基金项目: 中央引导地方科技发展资金项目(216Z0301G),河北省自然科学基金(F2019203285),河北省创新能力提升计划项目(22567626H)
详细信息
    作者简介:

    张世辉:男,教授,博士生导师,研究方向为视觉信息处理、模式识别

    赵维勃:男,硕士生,研究方向为人群计数、计算机视觉

    王磊:男,博士生,研究方向为草图识别、计算机视觉

    王威:男,硕士生,研究方向为人群计数、计算机视觉

    李群鹏:男,硕士生,研究方向为人群计数、计算机视觉

    通讯作者:

    赵维勃 zhaowb@stumail.ysu.edu.cn

  • 中图分类号: TN911.73; TP391.41

MSIANet: Multi-scale Interactive Attention Crowd Counting Network

Funds: The Central Government Guided Local Funds for Science and Technology Development (216Z0301G), The Natural Science Foundation of Hebei Province in China (F2019203285), The Innovation Capability Improvement Plan Project of Hebei Province (22567626H)
  • 摘要: 尺度变化、遮挡和复杂背景等因素使得拥挤场景下的人群数量估计成为一项具有挑战性的任务。为了应对人群图像中的尺度变化和现有多列网络中规模限制及特征相似性问题,该文提出一种多尺度交互注意力人群计数网络(Multi-Scale Interactive Attention crowd counting Network, MSIANet)。首先,设计了一个多尺度注意力模块,该模块使用4个具有不同感受野的分支提取不同尺度的特征,并将各分支提取的尺度特征进行交互,同时,使用注意力机制来限制多列网络的特征相似性问题。其次,在多尺度注意力模块的基础上设计了一个语义信息融合模块,该模块将主干网络的不同层次的语义信息进行交互,并将多尺度注意力模块分层堆叠,以充分利用多层语义信息。最后,基于多尺度注意力模块和语义信息融合模块构建了多尺度交互注意力人群计数网络,该网络充分利用多层次语义信息和多尺度信息生成高质量人群密度图。实验结果表明,与现有代表性的人群计数方法相比,该文提出的MSIANet可有效提升人群计数任务的准确性和鲁棒性。
  • 图  1  多尺度交互注意力人群计数网络

    图  2  多尺度注意力模块

    图  3  多尺度交互结构

    图  4  全局空间注意力机制

    图  5  全局通道注意力机制

    图  6  ShanghaiTech Part A数据集密度图可视化结果

    图  7  ShanghaiTech Part B数据集密度图可视化结果

    图  8  UCF_QNRF数据集密度图可视化结果

    图  9  UCF_CC_50数据集密度图可视化结果

    图  10  ShanghaiTech Part A上估计人数与真实人数比较

    图  11  ShanghaiTech Part B上估计人数与真实人数比较

    图  12  UCF_QNRF上估计人数与真实人数比较

    图  13  UCF_CC_50上估计人数与真实人数比较

    表  1  在3个人群计数基准数据集上使用MAE和RMSE指标进行评估(加粗表示最好结果)

    方法ShanghaiTech AShanghaiTech BUCF_QNRFUCF_CC_50
    MAERMSEMAERMSEMAERMSEMAERMSE
    MCNN [12] (2016)110.2173.226.441.3277.0426.0377.6509.1
    SANet [13] (2018)67.0104.58.413.6258.4334.9
    CSRNet [7] (2018)68.2115.010.616.0266.1397.5
    Switch-CNN [14] (2017)90.4135.021.633.4228.0445.0318.1439.2
    ADCrowdNet [19] (2019)63.298.98.215.7266.4358.0
    TEDNet [15] (2019)64.2109.18.212.8113.0188.0249.4354.5
    EPA [16] (2020)60.991.67.911.6205.1342.1
    DUBNet [8] (2020)64.6106.87.712.5105.6180.5243.8329.3
    DPDNet [17] (2021)66.6120.37.912.4126.8208.6
    MLAttnCNN [20] (2021)7.511.6101.0175.0200.8273.8
    URC [9] (2021)72.8111.612.018.7128.1218.0293.9443.0
    MPS [18] (2022)71.1110.79.615.0
    AutoScale [10] (2022)65.8112.18.613.9104.4174.2
    FusionCount [11] (2022)62.2101.26.911.8-
    MSIANet(本文)55.699.26.611.094.8184.6194.5273.3
    下载: 导出CSV

    表  2  消融实验结果

    变体模型MAERMSE
    MSIANet的前端网络+后端网络63.4105.2
    MSIANet的前端网络+MASM+后端网络58.5101.8
    MSIANet w/o MSAM60.7101.0
    MSIANet w/o GCAM57.3100.6
    MSIANet w/o GSAM57.199.5
    MSIANet55.699.2
    下载: 导出CSV
  • [1] 徐涛, 段仪浓, 杜佳浩, 等. 基于多尺度增强网络的人群计数方法[J]. 电子与信息学报, 2021, 43(6): 1764–1771. doi: 10.11999/JEIT200331

    XU Tao, DUAN Yinong, DU Jiahao, et al. Crowd counting method based on multi-scale enhanced network[J]. Journal of Electronics &Information Technology, 2021, 43(6): 1764–1771. doi: 10.11999/JEIT200331
    [2] 万洪林, 王晓敏, 彭振伟, 等. 基于新型多尺度注意力机制的密集人群计数算法[J]. 电子与信息学报, 2022, 44(3): 1129–1136. doi: 10.11999/JEIT210163

    WAN Honglin, WANG Xiaomin, PENG Zhenwei, et al. Dense crowd counting algorithm based on new multi-scale attention mechanism[J]. Journal of Electronics &Information Technology, 2022, 44(3): 1129–1136. doi: 10.11999/JEIT210163
    [3] TOPKAYA I S, ERDOGAN H, and PORIKLI F. Counting people by clustering person detector outputs[C]. Proceedings of the 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Seoul, Korea (South), 2014: 313–318.
    [4] LI Min, ZHANG Zhaoxiang, HUANG Kaiqi, et al. Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection[C]. Proceedings of the 19th International Conference on Pattern Recognition, Tampa, USA, 2008: 1–4.
    [5] CHAN A B, LIANG Z S J, and VASCONCELOS N. Privacy preserving crowd monitoring: Counting people without people models or tracking[C]. Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, USA, 2008: 1–7.
    [6] CHEN Ke, LOY C C, GONG Shaogang, et al. Feature mining for localised crowd counting[C]. Proceedings of the British Machine Vision Conference, Surrey, UK, 2012: 21.1–21.11.
    [7] LI Yuhong, ZHANG Xiaofan, and CHEN Deming. CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes[C]. Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1091–1100.
    [8] OH M H, OLSEN P, and RAMAMURTHY K N. Crowd counting with decomposed uncertainty[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 11799–11806. doi: 10.1609/aaai.v34i07.6852
    [9] XU Yanyu, ZHONG Ziming, LIAN Dongze, et al. Crowd counting with partial annotations in an image[C]. Proceedings of 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 15550–15559.
    [10] XU Chenfeng, LIANG Dingkang, XU Yongchao, et al. AutoScale: Learning to scale for crowd counting[J]. International Journal of Computer Vision, 2022, 130(2): 405–434. doi: 10.1007/s11263-021-01542-z
    [11] MA Yiming, SANCHEZ V, and GUHA T. FusionCount: Efficient crowd counting via multiscale feature fusion[C]. Proceedings of the IEEE International Conference on Image Processing, Bordeaux, France, 2022.
    [12] ZHANG Yingying, ZHOU Desen, CHEN Siqin, et al. Single-image crowd counting via multi-column convolutional neural network[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 589–597.
    [13] CAO Xinkun, WANG Zhipeng, ZHAO Yanyun, et al. Scale aggregation network for accurate and efficient crowd counting[C]. Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 2018: 757–773.
    [14] SAM D B, SURYA S, and BABU R V. Switching convolutional neural network for crowd counting[C]. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 4031–4039.
    [15] JIANG Xiaolong, XIAO Zehao, ZHANG Baochang, et al. Crowd counting and density estimation by trellis encoder-decoder networks[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 6126–6135.
    [16] YANG Yifan, LI Guorong, DU Dawei, et al. Embedding perspective analysis into multi-column convolutional neural network for crowd counting[J]. IEEE Transactions on Image Processing, 2020, 30: 1395–1407. doi: 10.1109/TIP.2020.3043122
    [17] LIAN Dongze, CHEN Xianing, LI Jing, et al. Locating and counting heads in crowds with a depth prior[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, To be Published.
    [18] ZAND M, DAMIRCHI H, FARLEY A, et al. Multiscale crowd counting and localization by multitask point supervision[C]. Proceedings of 2022 IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, Singapore, 2022.
    [19] LIU Ning, LONG Yongchao, ZOU Changqing, et al. ADCrowdNet: An attention-injective deformable convolutional network for crowd understanding[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 3220–3229.
    [20] TIAN Mengxiao, GUO Hao, and LONG Chengjiang. Multi-level attentive convoluntional neural network for crowd counting[J]. arXiv: 2105.11422, 2021.
    [21] LIU Yichao, SHAO Zongru, and HOFFMANN N. Global attention mechanism: Retain information to enhance channel-spatial interactions[J]. arXiv: 2112.05561, 2021.
    [22] IDREES H, TAYYAB M, ATHREY K, et al. Composition loss for counting, density map estimation and localization in dense crowds[C]. Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 2018: 544–559.
    [23] IDREES H, SALEEMI I, SEIBERT C, et al. Multi-source multi-scale counting in extremely dense crowd images[C]. Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, USA, 2013: 2547–2554.
  • 加载中
图(13) / 表(2)
计量
  • 文章访问数:  471
  • HTML全文浏览量:  570
  • PDF下载量:  110
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-05-19
  • 修回日期:  2022-07-29
  • 网络出版日期:  2022-08-22
  • 刊出日期:  2023-06-10

目录

    /

    返回文章
    返回