基于单一神经网络的多尺度人脸检测

刘宏哲; 杨少鹏; 袁家政; 王雪峤; 薛建明

doi:10.11999/JEIT180163

基于单一神经网络的多尺度人脸检测

doi: 10.11999/JEIT180163

1.
北京联合大学北京市信息服务工程重点实验室北京 100101
2.
北京开放大学北京 100081
3.
北京联合大学计算机技术研究所北京 100101

基金项目: 国家自然科学基金(61571045)，北京市属高校高水平教师队伍建设支持计划项目(IDHT20170511)，国家科技支撑项目(2015BAH55F03)，北京联合大学新起点项目(Zk10201703)，北京市教委科技计划一般项目(KM201811417002)

详细信息

作者简介:
刘宏哲：女，1971年生，教授，硕士生导师，研究方向为数字图像处理、旅游信息化

杨少鹏：男，1990年生，硕士生，研究方向为模式识别

袁家政：男，1971年生，教授，博士生导师，研究方向为数字图像处理、视觉计算与定位技术

王雪峤：女，1986年生，讲师，研究方向为模式识别

薛建明：男，1992年生，硕士生，研究方向为模式识别

通讯作者:
杨少鹏　 shaopeng568@163.com

中图分类号: TP391.4
计量
- 文章访问数: 1558
- HTML全文浏览量: 958
- PDF下载量: 87
- 被引次数: 0
出版历程
- 收稿日期: 2018-02-07
- 修回日期: 2018-07-05
- 网络出版日期: 2018-07-23
- 刊出日期: 2018-11-01

Multi-scale Face Detection Based on Single Neural Network

1.
Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China
2.
Beijing Open University, Beijing 100081, China
3.
Institute of Computer Technology, Beijing Union University, Beijing 100101, China

Funds: The National Natural Science Foundation of China (61571045), The Supporting Plan for Cultivating High Level Teachers in Colleges and Universities in Beijing (IDHT20170511), The National Science and Technology Support Project (2015BAH55F03), The Foundation of Beijing Union University (Zk10201703), The Foundation of Beijing Municipal Education Commission (KM201811417002)

摘要

摘要: 人脸检测是指检测并定位输入图像中所有的人脸，并返回精确的人脸位置和大小，是目标检测的重要方向。为了解决人脸尺度多样性给人脸检测造成的困难，该文提出一种新的基于单一神经网络的特征图融合多尺度人脸检测算法。该算法在不同大小的卷积层上预测人脸，实现实时多尺度人脸检测，并通过将浅层的特征图融合引入上下文信息提高小尺寸人脸检测精度。在数据集FDDB和WIDERFACE测试结果表明，所提方法达到了先进人脸检测的水平，并且该方法去掉了框推荐过程，因此检测速度更快。在WIDERFACE难、适中、简单3个子数据集上测试结果分别为87.9%, 93.2%, 93.4% MAP，检测速度为35 fps。所提算法与目前效果较好的极小人脸检测方法相比，在保证精度的同时提高了人脸检测速度。
- 多尺度人脸检测 /
- 上下文信息 /
- 特征图融合 /
- 卷积神经网络
Abstract: Face detection is finding and locating all faces in the input image, and then returning the position and size of the faces. It is an important direction of target detection. In order to solve the problem which is caused by the diversity of face size, a new single shot multiscale face algorithm is presented based on feature fusion. This method combines predictions from multiple feature maps with different resolutions to handle faces of various sizes, and the fusion of the feature maps in the shallow layers can improve the detection accuracy of the small size face by introducing the contextual information. Experimental results on the FDDB and WIDERFACE datasets confirm that the proposed method has competitive accuracy. Additionally, the object proposal step is removed, which makes the method fast. The proposed model achieves 87.9%, 93.2% and 93.4% Mean Average Precision (MAP) on the WIDERFACE sub-datasets respectively, at 35 fps. The proposed method outperforms a comparable state-of-the-art HR model, and at the same time improves the speed while ensuring the accuracy.
- Multi-scale face detection /
- Contextual information /
- Feature map fusion /
- Convolution neural network

HTML全文

图 1 SSD网络结构

下载: 全尺寸图片幻灯片

图 2 默认检测框^[16]

下载: 全尺寸图片幻灯片

图 3 增加上下文信息^[19]

下载: 全尺寸图片幻灯片

图 4 反卷积融合模块

下载: 全尺寸图片幻灯片

图 5 基于特征图融合的多尺度人脸检测网络结构

下载: 全尺寸图片幻灯片

图 6 特征图融合模型

下载: 全尺寸图片幻灯片

图 7 测试结果曲线

下载: 全尺寸图片幻灯片

图 8 FDDB上测试ROC曲线

下载: 全尺寸图片幻灯片

图 9 实验效果图

下载: 全尺寸图片幻灯片

表 1 检测框参数

特征层	步长n	检测框大小	宽高比
conv3_3	4	16	1
conv4_3	8	32	1
conv5_3	16	64	1
conv7	32	128	1
conv8_2	64	256	1
conv9_2	128	512	1

下载: 导出CSV

表 2 不同融合方式的MAP对比结果

模型名称	数据集	MAP
本文的融合型	WIDER	0.879
对比模型1	FACE	0.823
对比模型2	(Hard)	0.836

下载: 导出CSV

表 3 实验结果MAP对比

方法	难	适中	简单	检测速度(fps)
Faster-rcnn	0.712	0.845	0.897	<10
SSD-face	0.737	0.882	0.910	<43
HR	0.831	0.914	0.925	<5
本文方法	0.879	0.932	0.934	<35

下载: 导出CSV

参考文献(26)

JIANG Huaizu and LEARNED M E. Face detection with the faster r-cnn[C]. IEEE International Conference on Automatic Face & Gesture Recognition, Washington, D.C., USA, 2017: 650–657.

YANG Shuo, LUO Ping, LOY C, et al. WIDERFACE: A face detection benchmark[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 5525–5533.

CROSSWHITE N, BYRNE J, STAUFFER C, et al. Template adaptation for face verification and identification[C]. IEEE International Conference on Automatic Face & Gesture Recognition, Washington, D.C., USA, 2017: 1–8.

MAJUMDAR A, SINGH R, and VATSA M. Face verification via class sparsity based supervised encoding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1273–1280 doi: 10.1109/TPAMI.2016.2569436

GAO Yuan, MA Jiayi, and YUILLE A L. Semi-supervised sparse representation based classification for face recognition with insufficient labeled samples[J]. IEEE Transactions on Image Processing, 2017, 26(5): 2545–2560 doi: 10.1109/TIP.2017.2675341

HARIS KHAN M, MCDONAGH J, and TZIMIROPOULOS G. Synergy between face alignment and tracking via discriminative global consensus optimization[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 2017: 3791–3799.

GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 580–587.

VIOLA P and JONES M. Rapid object detection using a boosted cascade of simple features[C]. IEEE Computer Society Conference on Computer Vision & Pattern Recognition, Kauai, USA, 2001: 511.

LI Jianguo, WANG Tao, and ZHANG Yimin. Face detection using SURF cascade[C]. IEEE International Conference on Computer Vision Workshops, Ontario, Canada, 2012: 2183–2190.

MATHIAS M, BENENSON R, PEDERSOLI M, et al. Face detection without bells and whistles[C]. European Conference on Computer Vision, Zurich, Switzerland, 2014: 720–735.

LI Haoxiang, LIN Zhe, SHEN Xiaohui, et al. A convolutional neural network cascade for face detection[C]. Computer Vision and Pattern Recognition. Boston, USA, 2015: 5325–5334.

WU Shuzhe, KAN M, SHAN Shiguang, et al. Funnel-structured cascade for multi-view face detection with alignment-awareness[J]. Neurocomputing, 2016, 221(C): 138–145.

YANG Shuo, LUO Ping, CHEN C L, et al. Faceness-Net: Face detection through deep facial part responses[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(8): 1845–1859 doi: 10.1109/TPAMI.2017.2738644

GIRSHICK R. Fast r-cnn[C]. Proceedings of The IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1440–1448.

REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149 doi: 10.1109/TPAMI.2016.2577031

LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]. European Conference on Computer Vision, Amsterdam, Netherlands, 2016: 21–37.

DAI Jifeng, LI Yi, HE Kaiming, et al. R-fcn: Object detection via region based fully convolutional networks[C]. Advances in Neural Information Processing Systems, Barcelona, Spain, 2016: 379–387.

ZHU Chenchen, ZHENG Yutong, LUU K, et al. CMS-RCNN: Contextual multi-scale region-based CNN for unconstrained face detection[OL]. arXiv preprint arXiv:1606.05413, 2016.

HU Peiyun and RAMANAN D. Finding tiny faces[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, 2017: 1522–1530.

ERHAN D, SZEGEDY C, TOSHEV A, et al. Scalable object detection using deep neural networks[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 2147–2154.

CHEN Chenyi, LIU Mingyu, TUZEL O, et al. R-cnn for small object detection[C]. Asian Conference on Computer Vision, Taipei, China, 2016: 214–230.

BELL S, LAWRENCE ZITNICK C, BALA K, et al. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 2874–2883.

WONG R Y and HALL E L. Sequential hierarchical scene matching[J]. IEEE Transactions on Computers, 1978, 27(4): 359–366 doi: 10.1109/TC.1978.1675108

FU C Y, LIU Wei, RANGA A, et al. DSSD: Deconvolutional single shot detector[OL]. arXiv preprint arXiv:1701.06659, 2017.

WEI Xiang, ZHANG Dongqing, YU H, et al. Context-aware single-shot detector[C]. IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, USA, 2018: 1784–1793.

HOWARD A G. Some improvements on deep convolutional neural network based image classification[OL]. arXiv preprint arXiv:1312.5402, 2013.

施引文献

资源附件(0)

访问统计