基于多尺度和注意力融合学习的行人重识别

王粉花; 赵波; 黄超; 严由齐

doi:10.11999/JEIT190998

基于多尺度和注意力融合学习的行人重识别

doi: 10.11999/JEIT190998

王粉花^{1, 2, 3, ,},
赵波¹,
黄超¹,
严由齐¹

1.
北京科技大学自动化学院北京 100083
2.
北京科技大学人工智能研究院北京 100083
3.
北京市工业波谱成像工程中心北京 100083

基金项目: 国家重点研发计划重点专项(2017YFB1400101-01)，北京科技大学中央高校基本科研业务费专项 (FRF-BD-19-002A)

详细信息

作者简介:
王粉花：女，1971年生，副教授，硕士生导师，研究方向为模式识别和智能信息处理

赵波：男，1994年生，硕士生，研究方向为计算机视觉

黄超：男，1993年生，硕士生，研究方向为计算机视觉

严由齐：男，1997年生，硕士生，研究方向为计算机视觉

通讯作者:
王粉花　wangfenhua@ustb.edu.cn

中图分类号: TN911.73; TP391
计量
- 文章访问数: 2801
- HTML全文浏览量: 1074
- PDF下载量: 246
- 被引次数: 0
出版历程
- 收稿日期: 2019-12-13
- 修回日期: 2020-06-17
- 网络出版日期: 2020-07-20
- 刊出日期: 2020-12-08

Person Re-identification Based on Multi-scale Network Attention Fusion

Fenhua WANG^{1, 2, 3
, ,},
Bo ZHAO¹,
Chao HUANG¹,
Youqi YAN¹

1.
School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China
2.
Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing 100083, China
3.
Beijing Engineering Research Center of Industrial Spectrum Imaginghe, Beijing 100083, China

Funds: The Key Projects of National Key R & D Plan (2017YFB1400101-01), Beijing University of Science and Technology Central University Basic Research Business Expenses (FRF-BD-19-002A)

摘要

摘要: 行人重识别的关键依赖于行人特征的提取，卷积神经网络具有强大的特征提取以及表达能力。针对不同尺度下可以观察到不同的特征，该文提出一种基于多尺度和注意力网络融合的行人重识别方法(MSAN)。该方法通过对网络不同深度的特征进行采样，将采样的特征融合后对行人进行预测。不同深度的特征图具有不同的表达能力，使网络可以学习到行人身上更加细粒度的特征。同时将注意力模块嵌入到残差网络中，使得网络能更加关注于一些关键信息，增强网络特征学习能力。所提方法在Market1501, DukeMTMC-reID和MSMT17_V1数据集上首位准确率分别到了95.3%, 89.8%和82.2%。实验表明，该方法充分利用了网络不同深度的信息和关注的关键信息，使模型具有很强的判别能力，而且所提模型的平均准确率优于大多数先进算法。
- 行人重识别 /
- 多尺度 /
- 注意力 /
- 残差网络 /
- 度量学习
Abstract: The key to person re-identification depends on the extraction of pedestrian characteristics. Convolutional neural networks have powerful feature extraction and expression capabilities. In view of the fact that different features can be observed at different scales, a pedestrian re-identification method based on Multi-Scale Attention Network(MSAN) fusion is proposed. This method samples the features at different depths of the network and fuses the sampled features to predict pedestrians. Feature maps of different depths have different expressive powers, enabling the network to learn more fine-grained features of pedestrians. At the same time, the attention module is embedded in the residual network, so that the network can pay more attention to some key information and enhance the network feature learning ability. The accuracy of the proposed method on the datasets such as Market1501, DukeMTMC-reID and MSMT17_V1 reaches 95.3%, 89.8% and 82.2%, respectively. Experiments show that the method makes full use of the information of different depths of the network and the key information of interest, so that the model has strong discriminating ability, and the average accuracy of the proposed model is better than most state-of-the-art algorithms.
- Person re-identification /
- Multiple scale /
- Attention /
- Residual network /
- Metric learning

HTML全文

图 1 多尺度和注意力融合模型框架图

下载: 全尺寸图片幻灯片

图 2 ResNet50网络架构图

下载: 全尺寸图片幻灯片

图 3 Conv2_x模块架构图

下载: 全尺寸图片幻灯片

图 4 多尺度结构图

下载: 全尺寸图片幻灯片

图 5 CBAM模块图

下载: 全尺寸图片幻灯片

图 6 3元组损失

下载: 全尺寸图片幻灯片

表 1 多尺度融合模型准确率验证实验结果(%)

方法	Market1501		DukeMTMC-reID		MSMT17_V1
方法	Rank-1	mAP	Rank-1	mAP	Rank-1	mAP
SSAN	94.9	87.9	86.1	67.7	81.4	66.3
SSAN(+RK)	95.3	93.7	86.0	75.6	84.6	73.8
MSAN	95.3	87.9	89.8	78.8	82.2	60.6
MSAN (+RK)	95.9	93.9	92.3	89.7	85.0	74.6

下载: 导出CSV

表 2 CBAM模块准确率验证实验结果(%)

方法	Market1501		DukeMTMC-reID		MSMT17_V1
方法	Rank-1	mAP	Rank-1	mAP	Rank-1	mAP
MSN	94.4	86.2	87.5	77.2	79.6	56.0
MSN (+CBAM)	95.3	87.9	89.8	78.8	82.2	60.6
MSN(+RK)	95.3	93.1	90.9	89.2	83.2	72.0
MSN(+CBAM+RK)	95.9	93.9	92.3	89.7	85.0	74.6

下载: 导出CSV

表 3 所提MSAN算法与其他先进算法的准确率对比(%)

方法	Market1501		DukeMTMC-reID		MSMT17_V1
方法	Rank-1	mAP	Rank-1	mAP	Rank-1	mAP
SVDNet^[21]	82.3	62.1	76.7	56.8	–	–
DPFL^[22]	88.6	72.6	79.2	60.0	–	–
SVDNet+Era^[23]	87.1	71.3	79.3	62.4	–	–
TriNET+Era^[23]	83.9	68.7	73.0	56.6	–	–
DaRe^[24]	89.0	76.0	80.2	64.5	–	–
GP-reid^[25]	92.2	81.2	85.2	72.8	–	–
PCB^[4]	92.3	77.4	81.9	65.3	68.2	40.4
Aligned-ReID^[5]	92.6	82.3	–	–	–	–
PCB+RPP^[4]	93.8	81.6	83.3	69.2	–	–
MGN^[6]	95.7	86.9	88.7	78.4	–	–
BFENET^[8]	94.2	84.3	86.8	72.1	–	–
IANet^[18]	94.4	83.1	87.1	73.4	75.5	46.8
DGNet^[19]	94.8	86.0	86.6	74.8	77.2	52.3
OSNet^[20]	94.8	84.9	88.6	73.5	78.7	52.9
MSAN	95.3	87.9	89.8	78.8	82.2	60.6
MSAN(+RK)	95.9	93.9	92.3	89.7	85.0	74.6

下载: 导出CSV

参考文献(25)

FARENZENA M, BAZZANI L, PERINA A, et al. Person re-identification by symmetry-driven accumulation of local features[C]. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, USA, 2010: 2360–2367.

周智恒, 刘楷怡, 黄俊楚, 等. 一种基于等距度量学习策略的行人重识别改进算法[J]. 电子与信息学报, 2019, 41(2): 477–483. doi: 10.11999/JEIT180336

ZHOU Zhiheng, LIU Kaiyi, HUANG Junchu, et al. Improved metric learning algorithm for person re-identification based on equidistance[J]. Journal of Electronics &Information Technology, 2019, 41(2): 477–483. doi: 10.11999/JEIT180336

HIRZER M, ROTH P M, KÖSTINGER M, et al. Relaxed pairwise learned metric for person re-identification[C]. The 12th European Conference on Computer Vision, Florence, Italy, 2012: 780–793.

SUN Yifan, ZHENG Liang, YANG Yi, et al. Beyond part models: Person retrieval with refined part pooling (and a strong convolutional baseline)[C]. The 15th European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 480–496.

LUO Hao, JIANG Wei, ZHANG Xuan, et al. AlignedReID++: Dynamically matching local information for person re-identification[J]. Pattern Recognition, 2019, 94: 53–61. doi: 10.1016/j.patcog.2019.05.028

WANG Guanshuo, YUAN Yufeng, CHEN Xiong, et al. Learning discriminative features with multiple granularities for person re-identification[C]. 2018 ACM Multimedia Conference on Multimedia Conference, Seoul, Korea, 2018: 274–282.

陈鸿昶, 吴彦丞, 李邵梅, 等. 基于行人属性分级识别的行人再识别[J]. 电子与信息学报, 2019, 41(9): 2239–2246. doi: 10.11999/JEIT180740

CHEN Hongchang, WU Yancheng, LI Shaomei, et al. Person re-identification based on attribute hierarchy recognition[J]. Journal of Electronics &Information Technology, 2019, 41(9): 2239–2246. doi: 10.11999/JEIT180740

DAI Zuozhuo, CHEN Mingqiang, GU Xiaodong, et al. Batch DropBlock network for person re-identification and beyond[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 3691–3701.

WOO S, PARK J, LEE J Y, et al. Cbam: Convolutional block attention module[C]. The 15th European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 3–19.

LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. 2017 IEEE conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 2117–2125.

HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.

HERMANS A, BEYER L, and LEIBE B. In defense of the triplet loss for person re-identification[EB/OL]. https://arxiv.org/abs/1703.07737, 2017.

ZHENG Liang, SHEN Liyue, TIAN Lu, et al. Scalable person re-identification: A benchmark[C]. 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1116–1124.

RISTANI E, SOLERA F, ZOU R, et al. Performance measures and a data set for multi-target, multi-camera tracking[C]. 2016 European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 17–35.

WEI Longhui, ZHANG Shiliang, GAO Wen, et al. Person transfer GAN to bridge domain gap for person re-identification[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 79–88.

ZHONG Zhun, ZHENG Liang, CAO Donglin, et al. Re-ranking person re-identification with k-reciprocal encoding[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 1318–1327.

SALLEH S S, AZIZ N A A, MOHAMAD D, et al. Combining mahalanobis and jaccard distance to overcome similarity measurement constriction on geometrical shapes[J]. International Journal of Computer Science Issues, 2012, 9(4): 124–132.

ZHENG Zhedong, YANG Xiaodong, YU Zhiding, et al. Joint discriminative and generative learning for person re-identification[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 2138–2147.

HOU Ruibing, MA Bingpeng, CHANG Hong, et al. Interaction-and-aggregation network for person re-identification[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 9317–9326.

ZHOU Kaiyang, YANG Yongxin, CAVALLARO A, et al. Omni-Scale feature learning for person re-identification[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 3702–3712.

SUN Yifan, ZHENG Liang, DENG Weijian, et al. SVDNet for pedestrian retrieval[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 3800–3808.

CHEN Yanbei, ZHU Xiatian, and GONG Shaogang. Person re-identification by deep learning multi-scale representations[C]. 2017 IEEE International Conference on Computer Vision Workshops, Venice, Italy, 2017: 2590–2600.

ZHONG Zhun, ZHENG Liang, KANG Guoliang, et al. Random erasing data augmentation[EB/OL]. https://arxiv.org/abs/1708.04896, 2017.

WANG Yan, WANG Lequn, YOU Yurong, et al. Resource aware person re-identification across multiple resolutions[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8042–8051.

ALMAZAN J, GAJIC B, MURRAY N, et al. Re-ID done right: towards good practices for person re-identification[EB/OL]. https://arxiv.org/abs/1801.05339, 2018.

施引文献

资源附件(0)

访问统计