Multi-scale Pedestrian Detection in Infrared Images with Salient Background-awareness
摘要: 超大视场(U-FOV)红外成像系统探测范围大、不受光照限制,但存在尺度多样、小目标丰富的特点。为此该文提出一种具备背景感知能力的多尺度红外行人检测方法,在提高小目标检测性能的同时,减少冗余计算。首先,构建了4尺度的特征金字塔网络分别独立预测目标,补充高分辨率细节特征。其次,在特征金字塔结构的横向连接中融入注意力模块,产生显著性特征,抑制不相关区域的特征响应、突出图像局部目标特征。最后,在显著性系数的基础上构建了锚框掩膜生成子网络,约束锚框位置,排除平坦背景,提高处理效率。实验结果表明,显著性生成子网络仅增加5.94%的处理时间,具备轻量特性;超大视场(U-FOV)红外行人数据集上的识别准确率达到了93.20%,比YOLOv3高了26.49%;锚框约束策略能节约处理时间18.05%。重构模型具有轻量性和高准确性,适合于检测超大视场中的多尺度红外目标。Abstract: The infrared imaging system of Ultrawide Field Of View (U-FOV) has large monitoring range and is not limited by illumination, but there are diverse scales and abundant small objects. For accurately detecting them, a multi-scale infrared pedestrian detection method is proposed with the ability of background-awareness, which can improve the detection performance of small objects and reduce the redundant computation. Firstly, a four scales feature pyramid network is constructed to predict object independently and supplement detail features with higher resolution. Secondly, attention module is integrated into the horizontal connection of feature pyramid structure to generate salient features, suppress feature response of irrelevant areas and enhance the object features. Finally, the anchor mask generation subnetwork is constructed on the basis of salient coefficient to the location of the anchors, to eliminate the flat background, and to improve the processing efficiency. The experimental results show that the salient generation subnetwork only increases the processing time by 5.94%, and has the lightweight characteristic. The Average-Precision is 93.20% on the U-FOV infrared pedestrian dataset, 26.49% higher than that of YOLOv3. Anchor box constraint strategy can save 18.05% of processing time. The proposed method is lightweight and accurate, which is suitable for detecting multi-scale infrared objects in the U-FOV camera.
表 1 不同IoU阈值下的行人检测平均准确率
方法 主干网络 训练集 平均准确率(AP) IoU=0.3 IoU=0.45 IoU=0.5 IoU=0.7 Faster R-CNN ResNet101 U-FOV – – 0.5932 – SSD Mobilenet_v1 U-FOV – – 0.5584 – R-FCN ResNet101 U-FOV – – 0.6312 – CSP Resnet50 U-FOV – – 0.8414 – YOLOv3 Darknet53 U-FOV 0.6595 0.6671 0.6628 0.6461 YOLOv3+FS Darknet53 U-FOV 0.8880 0.8870 0.8828 0.8511 YOLOv3+FS Darknet53 Caltech+U-FOV 0.9057 0.9078 0.9084 0.8961 本文方法 Darknet53 Caltech+U-FOV 0.9201 0.9320 0.9315 0.9107 表 2 参数量对比
方法 总参数量 可训练参数量 不可训练参数量 YOLOv3 61576342 61523734 52608 本文方法 64861976 64806296 55680 表 3 U-FOV测试集图像总处理时间及处理帧速
方法 YOLOv3 YOLOv3+Attention FS+Attention 本文方法 总时间(s) 90.35 95.72 125.39 107.25 处理帧率 7.32 6.91 5.27 6.16 -
