基于多尺度增强网络的人群计数方法

徐涛; 段仪浓; 杜佳浩; 刘才华

doi:10.11999/JEIT200331

基于多尺度增强网络的人群计数方法

doi: 10.11999/JEIT200331 cstr: 32379.14.JEIT200331

1.
中国民航大学计算机科学与技术学院天津 300300
2.
中国民航大学中国民航信息技术科研基地天津 300300

基金项目: 天津市自然科学基金(18JCYBJC85100)，中央高校基本科研业务基金项目中国民航大学专项(3122018C024)，中国民航大学科研启动项目(2017QD16X)

详细信息

作者简介:
徐涛：男，1962年生，教授、博士生导师，研究方向为智能信息处理、图像处理

段仪浓：男，1994年生，硕士生，研究方向为计算机视觉与模式识别

杜佳浩：男，1994年生，硕士生，研究方向为计算机视觉与模式识别

刘才华：女，1987年生，讲师、博士，研究方向为机器学习与计算机视觉

通讯作者:
刘才华　chliu@cauc.edu.cn

中图分类号: TN911.73; TP391.4
计量
- 文章访问数: 1455
- HTML全文浏览量: 967
- PDF下载量: 106
- 被引次数: 0
出版历程
- 收稿日期: 2020-04-28
- 修回日期: 2020-10-12
- 网络出版日期: 2020-10-16
- 刊出日期: 2021-06-18

Crowd Counting Method Based on Multi-Scale Enhanced Network

1.
College of Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China
2.
Information Technology Base of Civil Aviation Administration of China, Civil Aviation University of China, Tianjin 300300, China

Funds: The Natural Science Foundation of Tianjin (18JCYBJC85100), The Fundamental Research Funds for the Central Universities from the Civil Aviation University of China (3122018C024), The Scientific Research Startup Project of the Civil Aviation University of China (2017QD16X)

摘要

摘要: 人群计数研究普遍使用欧几里得损失函数，易造成图像局部相关性缺失，且现有研究方法未能充分提取人群图像中连续变化的尺度特征，影响了人群计数模型的性能。针对上述问题，该文提出一种基于多尺度增强网络的人群计数模型(MSEN)。首先，在多分支结构生成网络中引入区域性判别网络，将二者组合形成嵌入式GAN模块，以增强生成图像的局部相关性；之后，基于金字塔池化结构设计了尺度增强模块，将该模块连接在嵌入式GAN模块之后，进一步从不同区域提取不同尺度的局部特征，以最大程度地应对人群图像局部尺度连续变化的问题，从而增强整体模型的泛化能力。最后，在3个具有挑战性的人群计数公共数据集上进行了广泛的实验。实验结果表明，该文所述模型可有效提升人群计数问题的准确性和鲁棒性。
- 人群计数 /
- 图像局部相关性 /
- 多尺度特征 /
- 嵌入式GAN模块 /
- 尺度增强模块
Abstract: The performance of the crowd counting methods is degraded due to the commonly used Euclidean loss ignoring the local correlation of images and the limited ability of the model to cope with multi-scale information. A crowd counting method based on Multi-Scale Enhanced Network(MSEN) is proposed to address the above problems. Firstly, an embedded GAN module with a multi-branch generator and a regional discriminator is designed to initially generate crowd density maps and optimize their local correlation. Then, a well-designed scale enhancement module is connected after the embedded GAN module to extract further local features of different scales from different regions, which will strengthen the generalization ability of the model. Extensive experimental results on three challenging public datasets demonstrate that the performance of the proposed method can effectively improve the accuracy and robustness of the prediction.
- Crowd counting /
- Image local correlation /
- Multi-scale feature /
- Embedded GAN module /
- Scale-enhancement module

HTML全文

图 1 MSEN模型结构示意图

下载: 全尺寸图片幻灯片

图 2 生成网络结构示意图

下载: 全尺寸图片幻灯片

图 3 尺度增强子模块结构示意图

下载: 全尺寸图片幻灯片

图 4 独立GAN结构与MSEN结构的预测图像与计算人数示例

下载: 全尺寸图片幻灯片

图 5 不同β取值与对应的模型MAE值

下载: 全尺寸图片幻灯片

表 1 数据集基本信息对比

数据集	图像数量	分辨率	最小人数	最大人数	密度水平
ShanghaiTech Part_A	482	不等	33	3139	高
ShanghaiTech Part_B	716	768×1024	9	578	低
UCF_CC_50	50	不等	94	4543	极高
UCF-QNRF	1535	不等	49	12865	极高

下载: 导出CSV

表 2 ShanghaiTech数据集实验结果

方法	Part_A		Part_B
方法	MAE	MSE	MAE	MSE
MCNN^[8]	110.2	173.2	26.4	41.3
Switch-CNN^[9]	90.4	135.0	21.6	33.4
ACSCP^[10]	75.7	102.7	17.2	27.4
CSRNet^[11]	68.2	115.0	10.6	16.0
SANet^[12]	67.0	104.5	8.4	13.6
PACNN^[19]	66.3	106.4	8.9	13.5
TEDnet^[20]	64.2	109.1	8.2	12.8
MSEN (本文)	63.5	106.2	8.2	12.3

下载: 导出CSV

表 3 UCF_CC_50数据集实验结果

方法	MAE	MSE
MCNN^[8]	377.6	509.1
Switch-CNN^[9]	318.1	439.2
ACSCP^[10]	291.0	404.6
CSRNet^[11]	266.1	397.5
PACNN^[19]	267.9	357.8
SANet^[12]	258.4	334.9
TEDnet^[20]	249.4	354.5
MSEN (本文)	226.7	310.6

下载: 导出CSV

表 4 UCF-QNRF数据集实验结果

方法	MAE	MSE
MCNN^[8](CL)	277.0	426.0
Switch-CNN^[9] (CL)	228.0	445.0
CL^[18]	132.0	191.0
TEDnet^[20]	113.0	188.0
MSEN (本文)	114.1	159.5

下载: 导出CSV

表 5 不同结构的模型及其对应的实验结果

模型序号	结构概述	嵌入式	尺度增强子模块数量	跳跃连接	MAE
(1)	G	–	–	–	67.5
(2)	GAN	–	–	–	65.6
(3)	GAN$ {\mathrm{G}\mathrm{A}\mathrm{N}}^{} $(E×1)	–	1	–	65.3
(4)	GAN$ {\mathrm{G}\mathrm{A}\mathrm{N}}^{} $(E×1+S)	–	1	√	65.2
(5)	GAN$ {\mathrm{G}\mathrm{A}\mathrm{N}}^{} $(E×2)	–	2	–	66.5
(6)	GAN$ {\mathrm{G}\mathrm{A}\mathrm{N}}^{} $(E×2+S)	–	2	√	66.4
(7)	嵌入式GAN+ E×1	√	1	–	65.0
(8)	嵌入式GAN+ E×1+S	√	1	√	64.7
(9)	嵌入式GAN+ E×2	√	2	–	64.1
(10)	嵌入式GAN+ E×2+S (MSEN)	√	2	√	63.5

下载: 导出CSV

参考文献(20)

[1]	陈朋, 汤一平, 王丽冉, 等. 多层次特征融合的人群密度估计[J]. 中国图象图形学报, 2018, 23(8): 1181–1192. doi: 10.11834/jig.180017
[2]	XIE Weidi, NOBLE J A, and ZISSERMAN A. Microscopy cell counting and detection with fully convolutional regression networks[J]. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, 2018, 6(3): 283–292. doi: 10.1080/21681163.2016.1149104
[3]	左静, 窦祥胜. 视频车辆分类与计数的模型与应用[J]. 运筹与管理, 2020, 29(1): 124–130.
[4]	CUI Kai, HU Cheng, WANG Rui, et al. Deep-learning-based extraction of the animal migration patterns from weather radar images[J]. Science China Information Sciences, 2020, 63(4): 140304. doi: 10.1007/s11432-019-2800-0
[5]	孙彦景, 石韫开, 云霄, 等. 基于多层卷积特征的自适应决策融合目标跟踪算法[J]. 电子与信息学报, 2019, 41(10): 2464–2470. doi: 10.11999/JEIT180971 SUN Yanjing, SHI Yunkai, YUN Xiao, et al. Adaptive strategy fusion target tracking based on multi-layer convolutional features[J]. Journal of Electronics &Information Technology, 2019, 41(10): 2464–2470. doi: 10.11999/JEIT180971
[6]	蒲磊, 冯新喜, 侯志强, 等. 基于空间可靠性约束的鲁棒视觉跟踪算法[J]. 电子与信息学报, 2019, 41(7): 1650–1657. doi: 10.11999/JEIT180780 PU Lei, FENG Xinxi, HOU Zhiqiang, et al. Robust visual tracking based on spatial reliability constraint[J]. Journal of Electronics &Information Technology, 2019, 41(7): 1650–1657. doi: 10.11999/JEIT180780
[7]	ZHANG Cong, LI Hongshen, WANG Xiaogang, et al. Cross-scene crowd counting via deep convolutional neural networks[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 833–841. doi: 10.1109/CVPR.2015.7298684.
[8]	ZHANG Yingying, ZHOU Desen, CHEN Siqin, et al. Single-image crowd counting via multi-column convolutional neural network[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 589–597. doi: 10.1109/CVPR.2016.70.
[9]	SAM D B, SURYA S, and BABU R V. Switching convolutional neural network for crowd counting[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 4031–4039.
[10]	SHEN Zan, XU Yi, NI Bingbing, et al. Crowd counting via adversarial cross-scale consistency pursuit[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 5245–5254. doi: 10.1109/CVPR.2018.00550.
[11]	LI Yuhong, ZHANG Xiaofan, and CHEN Deming. CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1091–1110. doi: 10.1109/CVPR.2018.00120.
[12]	CAO Xinkun, WANG Zhipeng, ZHAO Yanyun, et al. Scale aggregation network for accurate and efficient crowd counting[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 757–773.
[13]	SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]. The International Conference on Learning Representations, San Diego, USA, 2015: 1–14.
[14]	ISOLA P, ZHU Junyan, ZHOU Tinghui, et al. Image-to-image translation with conditional adversarial networks[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 5967–5976.
[15]	ZHAO Hengshuang, SHI Jianping, QI Xiaojuan, et al. Pyramid scene parsing network[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6230–6239. doi: 10.1109/CVPR.2017.660.
[16]	IDREES H, SALEEMI I, SEIBERT C, et al. Multi-source multi-scale counting in extremely dense crowd images[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Portland, USA, 2013: 2547–2554. doi: 10.1109/CVPR.2013.329
[17]	IDREES H, TAYYAB M, ATHREY K, et al. Composition loss for counting, density map estimation and localization in dense crowds[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 544–559. doi: 10.1007/978-3-030-01216-8_33.
[18]	QU Yanyun, CHEN Yizi, HUANG Jingying, et al. Enhanced pix2pix dehazing network[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 8152–8160. doi: 10.1109/CVPR.2019.00835.
[19]	SHI Miaojing, YANG Zhaohui, XU Chao, et al. Revisiting perspective information for efficient crowd counting[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 7271–7280.
[20]	JIANG Xiaolong, XIAO Zehao, ZHANG Baochang, et al. Crowd counting and density estimation by trellis encoder-decoder networks[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 6126–6135. doi: 10.1109/CVPR.2019.00629.