双域滤波三元组度量学习的行人再识别

肖进胜; 郭浩文; 张舒豪; 邹文涛; 王元方; 谢红刚

doi:10.11999/JEIT210385

双域滤波三元组度量学习的行人再识别

doi: 10.11999/JEIT210385 cstr: 32379.14.JEIT210385

1.
武汉大学电子信息学院武汉 430072
2.
湖北工业大学电气与电子学院武汉 430068

基金项目: 国家自然科学基金(42101448)

详细信息

作者简介:
肖进胜：男，博士，副教授，硕士生导师，研究方向为图像与视频处理

郭浩文：男，硕士生，研究方向为图像处理与行人重识别

张舒豪：男，硕士生，研究方向为图像与计算机视觉

邹文涛：男，硕士生，研究方向为图像与视频处理

王元方：男，硕士生，研究方向为图像与视频处理

谢红刚：男，博士，副教授，硕士生导师，研究方向为图像与机器视觉

通讯作者:
谢红刚　xiehg@hbut.edu.cn

中图分类号: TN911.73; TP391.41
计量
- 文章访问数: 600
- HTML全文浏览量: 306
- PDF下载量: 66
- 被引次数: 0
出版历程
- 收稿日期: 2021-05-07
- 修回日期: 2022-09-02
- 网络出版日期: 2022-09-03
- 刊出日期: 2022-11-14

Pedestrian Re-IDentification Algorithm Based on Dual-domain Filtering and Triple Metric Learning

1.
School of Electronic Information, Wuhan University, Wuhan 430072, China
2.
School of Electrical and Electronic Engineering, Hubei University of Technology, Wuhan 430068, China

Funds: The National Natural Science Foundation of China (42101448)

摘要

摘要: 在图像的捕获、传输或者处理过程中都有可能产生噪声，当图像被大量噪声影响时，许多行人再识别(ReID)方法将很难提取具有足够表达能力的行人特征，表现出较差的鲁棒性。该文主要针对低质图像的行人再识别问题，提出双域滤波分解构建3元组，用于训练度量学习模型。所提方法主要分为两个部分，首先分析了监控视频中不同图像噪声的分布特性，通过双域滤波进行图像增强。然后基于双域滤波分解对图像噪声具有很好的分离作用，该文提出一种新的3元组构建方式。在训练阶段，将双域滤波生成的低频原始图像和高频噪声图像，与原图一起作为输入3元组，网络可以进一步抑制噪声分量。同时优化了损失函数，将3元组损失和对比损失组合使用。最后利用re-ranking扩充排序表，提高识别的准确率。在加噪Market-1501和CUHK03数据集上的平均Rank-1为78.3%和21.7%，平均准确率均值(mAP)为66.9%和20.5%。加噪前后的Rank-1精度损失只有1.9%和7.8%，表明该文模型在含噪情况表现出较强的鲁棒性。
- 行人再识别 /
- 双域滤波 /
- 度量学习 /
- 3元组损失
Abstract: Noise may be generated in the process of image capture, transmission or processing. When the image is affected by a large amount of noise, it is difficult for many pedestrian Re-IDentification(ReID) methods to extract pedestrian features with sufficient expressive ability, which shows poor robustness. This paper focuses on the pedestrian re-identification with low quality image. The dual-domain filtering decomposition is proposed to construct triplet, which is used to train metric learning model. The proposed method mainly consists of two parts. Firstly, the distribution characteristics of different image noise in surveillance videos is analyzed and images are enhanced by dual-domain filtering. Secondly, based on the separation effect of dual-domain filtering, a new triplet is proposed. In the training stage, the original image with the low-frequency component, the noise with high-frequency component generated by the dual-domain filtering and the original image are used as the input triplet. So the noise component can be further suppressed by the network. At the same time, the loss function is optimized, and the triple loss and contrast loss are used in combination. Finally, re-ranking is used to expand the sorting table to improve the accuracy of identification. The average Rank-1 on the noisy Market-1501 and CUHK03 datasets are 78.3% and 21.7%, and the mean Average Precision(mAP) is 66.9% and 20.5%. The accuracy loss of Rank-1 before and after adding noise is only 1.9% and 7.8%, which indicates that the model in this paper shows strong robustness in the case of noise.
- Pedestrian Re-IDentification(ReID) /
- Dual-domain filtering /
- Metric learning /
- Triplet loss

HTML全文

图 1 3元组度量模型结构图

下载: 全尺寸图片幻灯片

图 2 双域滤波流程图

下载: 全尺寸图片幻灯片

图 3 模型训练整体结构图

下载: 全尺寸图片幻灯片

图 4 测试示意图

下载: 全尺寸图片幻灯片

图 5 Market-1501数据集上各方法CMC曲线

下载: 全尺寸图片幻灯片

图 6 CUHK03数据集上各方法CMC曲线

下载: 全尺寸图片幻灯片

表 1 不同图像增强方法在Market-1501和CUHK03数据集上的性能对比(%)

	指标	无处理^[12]	K-SVD^[13]	Huang等人^[14]	Kang等人^[15]	Luo等人^[16]	Son等人^[17]	双域滤波^[11]
高斯噪声	Rank-1	74.0/18.4	72.3/19.3	75.3/20.1	73.5/19.7	75.0/20.2	74.8/19.9	75.5/20.5
高斯噪声	mAP	49.2/17.2	46.7/18.1	51.4/18.8	47.8/18.6	51.2/19.1	51.0/18.7	51.9/19.2
椒盐噪声	Rank-1	64.8/18.5	65.6/19.1	66.6/20.9	66.3/19.6	66.4/20.1	66.2/19.6	66.4/20.3
椒盐噪声	mAP	40.3/17.2	43.0/17.9	42.7/19.2	42.2/18.2	42.4/18.8	42.1/18.1	42.2/18.9
雨噪声	Rank-1	75.5/15.9	74.9/16.5	75.9/16.6	76.2/17.5	76.6/17.8	75.9/16.8	76.8/18.1
雨噪声	mAP	51.4/14.6	51.3/15.4	51.7/15.5	52.0/16.0	52.1/16.4	51.8/15.7	63.4/16.8
无噪声	Rank-1	78.9/22.2	76.2/22.3	79.1/22.4	77.7/22.3	79.0/22.4	78.9/22.3	79.2/22.5
无噪声	mAP	55.0/21.0	52.4/21.8	65.1/22.5	53.8/22.0	64.7/22.6	64.4/22.2	66.8/22.9

下载: 导出CSV

表 2 不同图像增强方法的指标平均增益(%)

数据集	指标增益	K-SVD^[13]	Huang等人^[14]	Kang等人^[15]	Luo等人^[16]	Son等人^[17]	双域滤波^[11]
Market-1501/ CUHK03	Rank-1增益	–0.5/0.7	1.1/1.6	0.6/1.3	1.2/1.8	0.9/1.2	1.4/2.0
Market-1501/ CUHK03	mAP增益	–0.02/0.8	1.6/1.5	0.3/1.3	1.6/1.8	1.3/1.2	5.5/2.0

下载: 导出CSV

表 3 消融实验结果

3元组网络		Triplet loss	Contrastive loss	reranking	Rank-1 (%)	mAP (%)
常规	改进	Triplet loss	Contrastive loss	reranking	Rank-1 (%)	mAP (%)
√		√	√	√	76.7	65.3
	√	√			74.9	53.2
	√	√	√		76.5	54.6
	√	√	√	√	78.7	67.8

下载: 导出CSV

表 4 Market-1501数据集各方法的Rank-1和mAP(%)

方法	原始图像		高斯噪声		椒盐噪声		雨噪声
方法	Rank-1	mAP	Rank-1	mAP	Rank-1	mAP	Rank-1	mAP
LOMO+XQDA^[2]	43.8	22.7	37.5	19.5	39	21.6	35.7	20.3
SpindleNet^[7]	76.9	/	72.4	/	74.4	/	73.7	/
IDE_ResNet_50^[12]	78.9	55	74.2	49.4	75.5	51.9	75.1	50.6
SVDNet^[20]	82.3	62.1	78.6	58.4	80.3	59.2	78.1	57.5
APR^[21]	84.3	64.7	79.2	60.2	81.4	61.9	80.8	60.8
PIE^[6]	79.3	56.0	74.8	50.5	76.1	53.0	77.1	54.2
本文模型	79.8	68.3	77.5	65.8	78.6	67.1	78.7	67.8

下载: 导出CSV

表 5 Market-1501数据集各方法Rank-1和mAP的平均下降率(%)

	LOMO+XQDA^[2]	SpindleNet^[7]	IDE_ResNet_50^[12]	SVDNet^[20]	APR^[21]	PIE^[6]	本文模型
Rank-1下降	14.6	4.4	5.0	4.0	4.5	4.2	1.9
mAP下降	9.8	/	7.9	6.0	5.8	6.1	2.0

下载: 导出CSV

表 6 CUHK03数据集各方法的Rank-1和mAP(%)

方法	原始图像		高斯噪声		椒盐噪声		雨噪声
方法	Rank-1	mAP	Rank-1	mAP	Rank-1	mAP	Rank-1	mAP
LOMO+XQDA^[2]	14.8	13.6	8.2	9.5	12.7	10.8	10.5	10.1
SpindleNet^[7]	33.8	/	31.7	/	32.2	/	31.5	/
IDE_ResNet_50^[12]	22.2	21.0	18.4	17.6	19.5	18.8	18.9	17.8
SVDNet^[20]	40.9	37.8	37.2	28.8	38.4	31.6	36.4	27.1
APR^[21]	45.7	46.8	42.4	41.6	43.3	42.7	42.8	42.1
PIE^[6]	34.2	31.1	31.4	25.8	33.1	29.6	29.4	22.6
本文模型	23.5	22.7	21.1	19.8	21.4	20.2	22.5	21.5

下载: 导出CSV

表 7 CUHK03数据集上各方法Rank-1和mAP的平均下降率(%)

	LOMO +XQDA^[2]	SpindleNet^[7]	IDE_ResNet_50^[12]	SVDNet^[20]	APR^[21]	PIE^[6]	本文模型
Rank-1下降	29.3	5.9	14.7	8.7	6.3	8.5	7.8
mAP下降	25.5	/	14.0	22.8	10.0	16.4	9.7

下载: 导出CSV

参考文献(21)

[1]	何果财, 刘峡壁. 基于图像三元组挖掘的无监督视觉表示学习[J]. 计算机学报, 2018, 42(12): 2787–2803. doi: 10.11897/SP.J.1016.2018.02787 HE Guocai and LIU Xiabi. Unsupervised visual representation learning with image triplets mining[J]. Chinese Journal of Computers, 2018, 42(12): 2787–2803. doi: 10.11897/SP.J.1016.2018.02787
[2]	LIAO Shengcai, HU Yang, ZHU Xiangyu, et al. Person re-identification by local maximal occurrence representation and metric learning[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 2197–2206.
[3]	桑海峰, 王传正, 吕应宇, 等. 基于多信息流动卷积神经网络的行人再识别[J]. 电子学报, 2019, 47(2): 351–357. doi: 10.3969/j.issn.0372-2112.2019.02.014 SANG Haifeng, WANG Chuanzheng, LÜ Yingyu, et al. Person re-identification based on multi-information flow convolutional neural network[J]. Acta Electronica Sinica, 2019, 47(2): 351–357. doi: 10.3969/j.issn.0372-2112.2019.02.014
[4]	LUO Hao, JIANG Wei, ZHANG Xuan, et al. AlignedReID++: Dynamically matching local information for person re-identification[J]. Pattern Recognition, 2019, 94: 53–61. doi: 10.1016/j.patcog.2019.05.028
[5]	刘一敏, 蒋建国, 齐美彬, 等. 融合生成对抗网络和姿态估计的视频行人再识别方法[J]. 自动化学报, 2020, 46(3): 576–584. doi: 10.16383/j.aas.c180054 LIU Yimin, JIANG Jianguo, QI Meibin, et al. Video-based person re-identification method based on GAN and pose estimation[J]. Acta Automatica Sinica, 2020, 46(3): 576–584. doi: 10.16383/j.aas.c180054
[6]	ZHENG Liang, HUANG Yujia, LU Huchuan, et al. Pose-invariant embedding for deep person re-identification[J]. IEEE Transactions on Image Processing, 2019, 28(9): 4500–4509. doi: 10.1109/TIP.2019.2910414
[7]	ZHAO Haiyu, TIAN Maoqing, SUN Shuyang, et al. Spindle net: Person re-identification with human body region guided feature decomposition and fusion[C]. The 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 907–915.
[8]	KÖSTINGER M, HIRZER M, WOHLHART P, et al. Large scale metric learning from equivalence constraints[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 2288–2295.
[9]	HERMANS A, BEYER L, and LEIBE B. In defense of the triplet loss for person re-identification[J]. arXiv preprint arXiv: 1703.07737. 2017.
[10]	CHEN Weihua, CHEN Xiaotang, ZHANG Jianguo, et al. Beyond triplet loss: A deep quadruplet network for person re-identification[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 1320–1329.
[11]	XIAO Jinsheng, ZOU Wentao, CHEN Yunhua, et al. Single image rain removal based on depth of field and sparse coding[J]. Pattern Recognition Letters, 2018, 116: 212–217. doi: 10.1016/j.patrec.2018.10.006
[12]	ZHONG Zhun, ZHENG Liang, CAO Donglin, et al. Re-ranking person re-identification with k-reciprocal encoding[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 3652–3661.
[13]	AHARON M, ELAD M, and BRUCKSTEIN A. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation[J]. IEEE Transactions on Signal Processing, 2006, 54(11): 4311–4322. doi: 10.1109/TSP.2006.881199
[14]	HUANG Dean, KANG Liwei, YANG Minchun, et al. Context-aware single image rain removal[C]. 2012 IEEE International Conference on Multimedia and Expo, Melbourne, Australia, 2012: 164–169.
[15]	KANG Liwei, LIN C W, and FU Y H. Automatic single-image-based rain streaks removal via image decomposition[J]. IEEE Transactions on Image Processing, 2012, 21(4): 1742–1755. doi: 10.1109/TIP.2011.2179057
[16]	LUO Yu, XU Yong, and JI Hui. Removing rain from a single image via discriminative sparse coding[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 3397–3405.
[17]	SON C H and ZHANG Xioaping. Rain detection and removal via shrinkage-based sparse coding and learned rain dictionary[J]. Journal of Imaging Science and Technology, 2020, 64(3): 30501. doi: 10.2352/J.ImagingSci.Technol.2020.64.3.030501
[18]	肖进胜, 李文昊, 姜红, 等. 基于双域滤波的三维块匹配视频去噪算法[J]. 通信学报, 2015, 36(9): 91–97. doi: 10.11959/j.issn.1000-436x.2015245 XIAO Jinsheng, LI Wenhao, JIANG Hong, et al. Three dimensional block-matching video denoising algorithm based on dual-domain filtering[J]. Journal on Communications, 2015, 36(9): 91–97. doi: 10.11959/j.issn.1000-436x.2015245
[19]	陈巧媛, 陈莹. 基于困难样本三元组损失的多任务行人再识别[J]. 计算机辅助设计与图形学学报, 2019, 31(7): 1156–1165. doi: 10.3724/SP.J.1089.2019.17463 CHEN Qiaoyuan and CHEN Ying. TriHard loss based multi-task person re-identification[J]. Journal of Computer-Aided Design &Computer Graphics, 2019, 31(7): 1156–1165. doi: 10.3724/SP.J.1089.2019.17463
[20]	SUN Yifan, ZHENG Liang, DENG Weijian, et al. SVDNet for pedestrian retrieval[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 3820–3828.
[21]	LIN Yutian, ZHENG Liang, ZHENG Zhedong, et al. Improving person re-identification by attribute and identity learning[J]. Pattern Recognition, 2019, 95: 151–161. doi: 10.1016/j.patcog.2019.06.006