复杂环境下多尺度行人实时检测方法

周薇娜; 孙丽华; 徐志京

doi:10.11999/JEIT200436

复杂环境下多尺度行人实时检测方法

doi: 10.11999/JEIT200436 cstr: 32379.14.JEIT200436

上海海事大学上海 201306

基金项目: 国家自然科学基金(61404083, 52071200)，中国博士后科学基金(2015M581527)，专用集成电路与系统国家重点实验室开放研究课题(2021KF010)

详细信息

作者简介:
周薇娜：女，1982年生，副教授，研究方向为图像处理、电路和嵌入式系统、人工智能

孙丽华：女，1995年生，硕士生，研究方向为模式识别与图像处理

徐志京：男，1972年生，副教授，研究方向为海上智能交通系统、信息获取与智能处理

通讯作者:
周薇娜　wnzhou@shmtu.edu.cn

中图分类号: TN911.73
计量
- 文章访问数: 1231
- HTML全文浏览量: 898
- PDF下载量: 135
- 被引次数: 0
出版历程
- 收稿日期: 2020-06-01
- 修回日期: 2020-12-01
- 网络出版日期: 2021-03-31
- 刊出日期: 2021-07-10

A Real-time Detection Method for Multi-scale Pedestrians in Complex Environment

Shanghai Maritime University, Shanghai 201306, China

Funds: The National Natural Science Foundation of China (61404083, 52071200), China Postdoctoral Science Foundation (2015M581527), The State Key Laboratory of ASIC & System (2021KF010)

摘要

摘要: 作为计算机视觉和图像处理研究领域中的经典课题，行人检测技术在智能驾驶、视频监控等领域中具有广泛的应用空间。然而，面对一些复杂的环境和情况，如阴雨、雾霾、被遮挡、照明度变化、目标尺度差异大等，常见的基于可见光或红外图像的行人检测方法的效果尚不尽如人意，无论是在检测准确率还是检测速度上。该文分析并抓住可见光和红外检测系统中行人特征差异较大，但在不同环境中又各有优势的特点，并结合多尺度特征提取方法，提出一种适用于多样复杂环境下多尺度行人实时检测的方法——融合行人检测网络(FPDNet)。该网络主要由特征提取骨干网络、多尺度检测和信息决策融合3个部分构成，可自适应提取可见光或红外背景下的多尺度行人。实验结果证明，该检测网络在多种复杂视觉环境下都具有较好的适应能力，在检测准确性和检测速度上均能满足实际应用的需求。
- 行人检测 /
- 复杂环境 /
- 自适应提取 /
- 多尺度 /
- 决策融合
Abstract: As a classic subject in computer vision and image processing, pedestrian detection has a wide range of applications to intelligence driving and video monitoring fields. However, most of pedestrian detection methods based on visible or infrared images have no satisfying result in some complex environments or situations, such as rain, smog, occlusion, variation of illuminance and target scales, no matter in terms of detection accuracy or speed. This paper analyzes and finds out that, pedestrians usually have quite different characteristics in visible and infrared image, and which have their own advantages in different environments. Therefore, combining fusion and multi-scale technology, a real-time multi-scale pedestrian detection algorithm suitable for complex environment named FPDNet (Fusion Pedestrian Detection Network) is proposed. The detection framework is consisted by three main modules: feature extraction backbone network, multi-scale detection network and decision-level fusion network. The proposed method is able to extract multi-scale pedestrian characteristics under visible or infrared background adaptively. Experimental results prove that the detection network has good adaptability in complex visual environments, and can meet the demands of practical applications to detection accuracy and speed.
- Pedestrian detection /
- Complex environment /
- Adaptive extracting /
- Multi-scale /
- Decision-level fusion

HTML全文

图 1 FPDNet顶层框图

下载: 全尺寸图片幻灯片

图 2 多尺度检测网络内部结构图

下载: 全尺寸图片幻灯片

图 3 骨干基础网络基本单元

下载: 全尺寸图片幻灯片

图 4 SPP层结构

下载: 全尺寸图片幻灯片

图 5 多尺度检测模块

下载: 全尺寸图片幻灯片

图 6 基于决策融合的目标检测流程

下载: 全尺寸图片幻灯片

图 7 4幅行人检测实验图

下载: 全尺寸图片幻灯片

图 8 融合检测效果对比图

下载: 全尺寸图片幻灯片

表 1 骨干基础网络结构表

重复次数	类别	卷积核	卷积核尺寸	输出特征图大小
	Conv	64	7×7/2	208×208
	Max		2×2/2	104×104
3	Conv	64	3×3/1
	Conv	64	3×3/1
	Res			104×104
	Conv	128	3×3/2
	Conv	128	3×3/1
	Res			52×52
3	Conv	128	3×3/1
	Conv	128	3×3/1
	Res			52×52
	Conv	256	3×3/2
	Conv	256	3×3/1
	Res			26×26
4	Conv	256	3×3/1
	Conv	256	3×3/1
	Res			26×26
	Conv	512	3×3/2
	Conv	512	3×3/1
	Res			13×13
2	Conv	512	3×3/1
	Conv	512	3×3/1
	Res			13×13

下载: 导出CSV

表 2 候选框的宽度和高度表

检测层尺寸(像素)	(宽度，高度)	(宽度，高度)	(宽度，高度)
13×13	(41,103)	(53,138)	(77,205)
26×26	(30,74)	(30,94)	(35,84)
104×104	(20,30)	(20,51)	(27,61)

下载: 导出CSV

表 3 网络模型的对比结果表

模型	mAP(%)	FPS
ACF+T+THOG	71.49	32
HalFus+TSDCNN	88.24	2.5
TSDCNN+Ada	89.03	1.3
SSD	88.01	42
YOLOv3	91.35	45
YOLOv3-tiny	80.57	155
FPDNet	91.29	68

下载: 导出CSV

参考文献(28)

[1]	SAGAR U, RAJA R, and SHEKHAR H. Deep learning for pedestrian detection[J]. International Journal of Scientific and Research Publications, 2019, 9(8): 66–69. doi: 10.29322/IJSRP.9.08.2019.p9212
[2]	PRISCILLA C V and SHEILA S P A. Pedestrian detection - A survey[C]. Proceedings of the 1st International Conference on Innovative Computing and Cutting-edge Technologies, Istanbul, Turkey, 2020: 349–358. doi: 10.1007/978-3-030-38501-9_35.
[3]	CHEN Runxing, WANG Xiaofei, LIU Yong, et al. A survey of pedestrian detection based on deep learning[C]. Proceedings of the 8th International Conference on Communications, Signal Processing, and Systems, Singapore, 2020: 1511–1516.
[4]	LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2): 91–110. doi: 10.1023/B:VISI.0000029664.99615.94
[5]	孙锐, 陈军, 高隽. 基于显著性检测与HOG-NMF特征的快速行人检测方法[J]. 电子与信息学报, 2013, 35(8): 1921–1926. doi: 10.3724/SP.J.1146.2012.01700 SUN Rui, CHEN Jun, and GAO Jun. Fast pedestrian detection based on saliency detection and HOG-NMF features[J]. Journal of Electronics &Information Technology, 2013, 35(8): 1921–1926. doi: 10.3724/SP.J.1146.2012.01700
[6]	FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object detection with discriminatively trained part- based models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627–1645. doi: 10.1109/TPAMI.2009.167
[7]	HASTIE T, ROSSET S, ZHU Ji, et al. Multi-class AdaBoost[J]. Statistics and its Interface, 2009, 2(3): 349–360. doi: 10.4310/SII.2009.v2.n3.a8
[8]	BREIMAN L. Random forests[J]. Machine Learning, 2001, 45(1): 5–32. doi: 10.1023/A:1010933404324
[9]	陈勇, 刘曦, 刘焕淋. 基于特征通道和空间联合注意机制的遮挡行人检测方法[J]. 电子与信息学报, 2020, 42(6): 1486–1493. doi: 10.11999/JEIT190606 CHEN Yong, LIU Xi, and LIU Huanlin. Occluded pedestrian detection based on joint attention mechanism of channel-wise and spatial information[J]. Journal of Electronics &Information Technology, 2020, 42(6): 1486–1493. doi: 10.11999/JEIT190606
[10]	REN Jing, REN Rui, GREEN M, et al. Defect detection from X-ray images using a three-stage deep learning algorithm[C]. Proceedings of 2019 IEEE Canadian Conference of Electrical and Computer Engineering, Edmonton, Canada, 2019: 1–4. doi: 10.1109/CCECE.2019.8861944.
[11]	PAN Meiyan, CHEN Jianjun, WANG Shengli, et al. A novel approach for marine small target detection based on deep learning[C]. Proceedings of the IEEE 4th International Conference on Signal and Image Processing, Wuxi, China, 2019: 395–399. doi: 10.1109/SIPROCESS.2019.8868862.
[12]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 580–587. doi: 10.1109/CVPR.2014.81.
[13]	GIRSHICK R. Fast R-CNN[C]. Proceedings of 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1440–1448. doi: 10.1109/ICCV.2015.169.
[14]	REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031
[15]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 779–788. doi: 10.1109/CVPR.2016.91.
[16]	REDMON J and FARHADI A. YOLO9000: Better, faster, stronger[C]. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6517–6525. doi: 10.1109/CVPR.2017.690.
[17]	REDMON J and FARHADI A. YOLOv3: An incremental improvement[J]. arXiv: 1804.02767, 2018.
[18]	LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]. Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 21–37. doi: 10.1007/978-3-319-46448-0_2.
[19]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904–1916. doi: 10.1109/tpami.2015.2389824
[20]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
[21]	LIU Weiyang, WEN Yandong, YU Zhiding, et al. Large-margin Softmax loss for convolutional neural networks[C]. Proceedings of the 33rd International Conference on Machine Learning, New York, USA, 2016: 507–516.
[22]	HWANG S, PARK J, KIM N, et al. Multispectral pedestrian detection: Benchmark dataset and baseline[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1037–1045. doi: 10.1109/CVPR.2015.7298706.
[23]	KANUNGO T, MOUNT D M, NETANYAHU N S, et al. An efficient K-means clustering algorithm: Analysis and implementation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 881–892. doi: 10.1109/TPAMI.2002.1017616
[24]	BOTTOU L. Stochastic gradient descent tricks[M]. Neural Networks: Tricks of the Trade. 2nd ed. Berlin Germany: Springer, 2012: 421–436. doi: 10.1007/978-3-642-35289-8_25.
[25]	RAHMAN M A and WANG Yang. Optimizing intersection-over-union in deep neural networks for image segmentation[C]. Proceedings of the 12th International Symposium on Advances in Visual Computing, Las Vegas, USA, 2016: 234–244. doi: 10.1007/978-3-319-50835-1_22.
[26]	KROTOSKY S J and TRIVEDI M M. On color-, infrared-, and multimodal-stereo approaches to pedestrian detection[J]. IEEE Transactions on Intelligent Transportation Systems, 2007, 8(4): 619–629. doi: 10.1109/TITS.2007.908722
[27]	LIU Jingjing, ZHANG Shaoting, WANG Shu, et al. Multispectral deep neural networks for pedestrian detection[C]. Proceedings of 2016 British Machine Vision Conference, York, UK, 2016: 73.1–73.13. doi: 10.5244/C.30.73.
[28]	KÖNIG D, ADAM M, JARVERS C, et al. Fully convolutional region proposal networks for multispectral person detection[C]. Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, USA, 2017: 243–250. doi: 10.1109/CVPRW.2017.36.