Multi-scale Face Detection Based on Single Neural Network
-
摘要: 人脸检测是指检测并定位输入图像中所有的人脸,并返回精确的人脸位置和大小,是目标检测的重要方向。为了解决人脸尺度多样性给人脸检测造成的困难,该文提出一种新的基于单一神经网络的特征图融合多尺度人脸检测算法。该算法在不同大小的卷积层上预测人脸,实现实时多尺度人脸检测,并通过将浅层的特征图融合引入上下文信息提高小尺寸人脸检测精度。在数据集FDDB和WIDERFACE测试结果表明,所提方法达到了先进人脸检测的水平,并且该方法去掉了框推荐过程,因此检测速度更快。在WIDERFACE难、适中、简单3个子数据集上测试结果分别为87.9%, 93.2%, 93.4% MAP,检测速度为35 fps。所提算法与目前效果较好的极小人脸检测方法相比,在保证精度的同时提高了人脸检测速度。Abstract: Face detection is finding and locating all faces in the input image, and then returning the position and size of the faces. It is an important direction of target detection. In order to solve the problem which is caused by the diversity of face size, a new single shot multiscale face algorithm is presented based on feature fusion. This method combines predictions from multiple feature maps with different resolutions to handle faces of various sizes, and the fusion of the feature maps in the shallow layers can improve the detection accuracy of the small size face by introducing the contextual information. Experimental results on the FDDB and WIDERFACE datasets confirm that the proposed method has competitive accuracy. Additionally, the object proposal step is removed, which makes the method fast. The proposed model achieves 87.9%, 93.2% and 93.4% Mean Average Precision (MAP) on the WIDERFACE sub-datasets respectively, at 35 fps. The proposed method outperforms a comparable state-of-the-art HR model, and at the same time improves the speed while ensuring the accuracy.
-
图 2 默认检测框[16]
图 3 增加上下文信息[19]
表 1 检测框参数
特征层 步长n 检测框大小 宽高比 conv3_3 4 16 1 conv4_3 8 32 1 conv5_3 16 64 1 conv7 32 128 1 conv8_2 64 256 1 conv9_2 128 512 1 表 2 不同融合方式的MAP对比结果
模型名称 数据集 MAP 本文的融合型 WIDER 0.879 对比模型1 FACE 0.823 对比模型2 (Hard) 0.836 表 3 实验结果MAP对比
方法 难 适中 简单 检测速度(fps) Faster-rcnn 0.712 0.845 0.897 <10 SSD-face 0.737 0.882 0.910 <43 HR 0.831 0.914 0.925 <5 本文方法 0.879 0.932 0.934 <35 -
JIANG Huaizu and LEARNED M E. Face detection with the faster r-cnn[C]. IEEE International Conference on Automatic Face & Gesture Recognition, Washington, D.C., USA, 2017: 650–657. YANG Shuo, LUO Ping, LOY C, et al. WIDERFACE: A face detection benchmark[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 5525–5533. CROSSWHITE N, BYRNE J, STAUFFER C, et al. Template adaptation for face verification and identification[C]. IEEE International Conference on Automatic Face & Gesture Recognition, Washington, D.C., USA, 2017: 1–8. MAJUMDAR A, SINGH R, and VATSA M. Face verification via class sparsity based supervised encoding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1273–1280 doi: 10.1109/TPAMI.2016.2569436 GAO Yuan, MA Jiayi, and YUILLE A L. Semi-supervised sparse representation based classification for face recognition with insufficient labeled samples[J]. IEEE Transactions on Image Processing, 2017, 26(5): 2545–2560 doi: 10.1109/TIP.2017.2675341 HARIS KHAN M, MCDONAGH J, and TZIMIROPOULOS G. Synergy between face alignment and tracking via discriminative global consensus optimization[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, USA, 2017: 3791–3799. GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 580–587. VIOLA P and JONES M. Rapid object detection using a boosted cascade of simple features[C]. IEEE Computer Society Conference on Computer Vision & Pattern Recognition, Kauai, USA, 2001: 511. LI Jianguo, WANG Tao, and ZHANG Yimin. Face detection using SURF cascade[C]. IEEE International Conference on Computer Vision Workshops, Ontario, Canada, 2012: 2183–2190. MATHIAS M, BENENSON R, PEDERSOLI M, et al. Face detection without bells and whistles[C]. European Conference on Computer Vision, Zurich, Switzerland, 2014: 720–735. LI Haoxiang, LIN Zhe, SHEN Xiaohui, et al. A convolutional neural network cascade for face detection[C]. Computer Vision and Pattern Recognition. Boston, USA, 2015: 5325–5334. WU Shuzhe, KAN M, SHAN Shiguang, et al. Funnel-structured cascade for multi-view face detection with alignment-awareness[J]. Neurocomputing, 2016, 221(C): 138–145. YANG Shuo, LUO Ping, CHEN C L, et al. Faceness-Net: Face detection through deep facial part responses[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(8): 1845–1859 doi: 10.1109/TPAMI.2017.2738644 GIRSHICK R. Fast r-cnn[C]. Proceedings of The IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1440–1448. REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149 doi: 10.1109/TPAMI.2016.2577031 LIU Wei, ANGUELOV D, ERHAN D, et al. SSD: Single shot multibox detector[C]. European Conference on Computer Vision, Amsterdam, Netherlands, 2016: 21–37. DAI Jifeng, LI Yi, HE Kaiming, et al. R-fcn: Object detection via region based fully convolutional networks[C]. Advances in Neural Information Processing Systems, Barcelona, Spain, 2016: 379–387. ZHU Chenchen, ZHENG Yutong, LUU K, et al. CMS-RCNN: Contextual multi-scale region-based CNN for unconstrained face detection[OL]. arXiv preprint arXiv:1606.05413, 2016. HU Peiyun and RAMANAN D. Finding tiny faces[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, 2017: 1522–1530. ERHAN D, SZEGEDY C, TOSHEV A, et al. Scalable object detection using deep neural networks[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014: 2147–2154. CHEN Chenyi, LIU Mingyu, TUZEL O, et al. R-cnn for small object detection[C]. Asian Conference on Computer Vision, Taipei, China, 2016: 214–230. BELL S, LAWRENCE ZITNICK C, BALA K, et al. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 2874–2883. WONG R Y and HALL E L. Sequential hierarchical scene matching[J]. IEEE Transactions on Computers, 1978, 27(4): 359–366 doi: 10.1109/TC.1978.1675108 FU C Y, LIU Wei, RANGA A, et al. DSSD: Deconvolutional single shot detector[OL]. arXiv preprint arXiv:1701.06659, 2017. WEI Xiang, ZHANG Dongqing, YU H, et al. Context-aware single-shot detector[C]. IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, USA, 2018: 1784–1793. HOWARD A G. Some improvements on deep convolutional neural network based image classification[OL]. arXiv preprint arXiv:1312.5402, 2013.