Pedestrian Re-Identification Based on CNN and TransFormer Multi-scale Learning

CHEN Ying; KUANG Cheng

doi:10.11999/JEIT220601

Volume 45 Issue 6

Jun. 2023

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2023 > 45(6): 2256-2263

CHEN Ying, KUANG Cheng. Pedestrian Re-Identification Based on CNN and TransFormer Multi-scale Learning[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2256-2263. doi: 10.11999/JEIT220601

Citation:

CHEN Ying, KUANG Cheng. Pedestrian Re-Identification Based on CNN and TransFormer Multi-scale Learning[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2256-2263. doi: 10.11999/JEIT220601

Citation:

PDF( 1700 KB)

Pedestrian Re-Identification Based on CNN and TransFormer Multi-scale Learning

doi: 10.11999/JEIT220601

CHEN Ying^,,
KUANG Cheng

School of Internet of Things Engineering, Jiangnan University, Wuxi 214122, China

Funds: The National Natural Science Foundation of China (62173160)

Received Date: 2022-05-12
Rev Recd Date: 2022-11-11

Available Online: 2022-11-19

Publish Date: 2023-06-10

Abstract

Abstract

Person Re-IDentification (ReID) aims to retrieve specific pedestrian targets across surveillance cameras. For the purpose of aggregating the multi-granularity features of pedestrian images and further solving the problem of deep feature mapping correlation, Person Re-Identification based on CNN and TransFormer Multi-scale learning (CTM) is proposed. The CTM network is composed of a global branch, a deep aggregation branch and a feature pyramid branch. Global branch extracts global features of pedestrian images, and extracts hierarchical features with different scales. The deep aggregation branch aggregates recursively the hierarchical features of CNN and extracts multi-scale features. The feature pyramid branch is a two-way pyramid structure, under the attention module and orthogonal regularization operation, it can significantly improve the performance of the network. Experiments on three large scale datasets show the effectiveness of CTM. On the Market1501, DukeMTMC-reID and MSMT17 datasets, mAP/Rank-1 reached 90.2%/96.0%, 82.3%/91.6% and 63.2%/83.7%, which is superior to other existing methods.
- Pedestrian Re-IDentification (ReID),
- TransFormer,
- CNN,
- Pyramid structure

FullText(HTML)

References(33)

References

[1]	邹国锋, 傅桂霞, 高明亮, 等. 行人重识别中度量学习方法研究进展[J]. 控制与决策, 2021, 36(7): 1547–1557. doi: 10.13195/j.kzyjc.2020.0801 ZOU Guofeng, FU Guixia, GAO Mingliang, et al. A survey on metric learning in person re-identification[J]. Control and Decision, 2021, 36(7): 1547–1557. doi: 10.13195/j.kzyjc.2020.0801
[2]	贲晛烨, 徐森, 王科俊. 行人步态的特征表达及识别综述[J]. 模式识别与人工智能, 2012, 25(1): 71–81. doi: 10.16451/j.cnki.issn1003-6059.2012.01.010 BEN Xianye, XU Sen, and WANG Kejun. Review on pedestrian gait feature expression and recognition[J]. Pattern Recognition and Artificial Intelligence, 2012, 25(1): 71–81. doi: 10.16451/j.cnki.issn1003-6059.2012.01.010
[3]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.
[4]	SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]. The 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1–9.
[5]	ZHOU Kaiyang, YANG Yongxin, CAVALLARO A, et al. Omni-scale feature learning for person re-identification[C]. The 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 3701–3711.
[6]	WIECZOREK M, RYCHALSKA B, and DABROWSKI J. On the unreasonable effectiveness of centroids in image retrieval[C]. The 28th International Conference on Neural Information Processing, Sanur, Indonesia, 2021: 212–223.
[7]	匡澄, 陈莹. 基于多粒度特征融合网络的行人重识别[J]. 电子学报, 2021, 49(8): 1541–1550. doi: 10.12263/DZXB.20200974 KUANG Cheng and CHEN Ying. Multi-granularity feature fusion network for person re-identification[J]. Acta Electronica Sinica, 2021, 49(8): 1541–1550. doi: 10.12263/DZXB.20200974
[8]	CHEN Binghui, DENG Weihong, and HU Jiani. Mixed high-order attention network for person re-identification[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 371–381.
[9]	CHEN Xuesong, FU Canmiao, ZHAO Yong, et al. Salience-guided cascaded suppression network for person re-identification[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 3297–3307.
[10]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6000–6010.
[11]	HAN Kai, WANG Yunhe, CHEN Hanting, et al. A survey on visual transformer[EB/OL]. https://doi.org/10.48550/arXiv.2012.12556, 2012.
[12]	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv: 2010.11929, 2020.
[13]	HE Shuting, LUO Hao, WANG Pichao, et al. TransReID: Transformer-based object re-identification[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 14993–15002.
[14]	PENG Zhiliang, HUANG Wei, GU Shanzhi, et al. Conformer: Local features coupling global representations for visual recognition[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 357–366.
[15]	WANG Wenhai, XIE Enze, LI Xiang, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021.
[16]	ZHANG Guowen, ZHANG Pingping, QI Jinqing, et al. HAT: Hierarchical aggregation transformers for person re-identification[C]. The 29th ACM International Conference on Multimedia, Chengdu, China, 2021: 516–525.
[17]	ZHANG Suofei, YIN Zirui, WU X, et al. FPB: Feature pyramid branch for person re-identification[EB/OL]. https://doi.org/10.48550/arXiv.2108.01901, 2021.
[18]	ZHENG Liang, SHEN Liyue, TIAN Lu, et al. Scalable person re-identification: A benchmark[C]. The 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1116–1124.
[19]	RISTANI E, SOLERA F, ZOU R, et al. Performance measures and a data set for multi-target, multi-camera tracking[C]. The European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 17–35.
[20]	WEI Longhui, ZHANG Shiliang, GAO Wen, et al. Person transfer GAN to bridge domain gap for person re-identification[C]. The 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 79–88.
[21]	ZHANG Zhizheng, LAN Cuiling, ZENG Wenjun, et al. Relation-aware global attention for person re-identification[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 3183–3192.
[22]	CHEN Tianlong, DING Shaojin, XIE Jingyi, et al. ABD-net: Attentive but diverse person re-identification[C]. The 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 8350–8360.
[23]	HERMANS A, BEYER L, and LEIBE B. In defense of the triplet loss for person re-identification[EB/OL]. https://doi.org/10.48550/arXiv.1809.05864, 2017.
[24]	SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 2818–2826.
[25]	BRYAN B, GONG Yuan, ZHANG Yizhe, et al. Second-order non-local attention networks for person re-identification[C]. The 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 3759–3768.
[26]	WANG Guan'an, YANG Shuo, LIU Huanyu, et al. High-order information matters: Learning relation and topology for occluded person re-identification[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 6448–6457.
[27]	JIN Xin, LAN Cuiling, ZENG Wenjun, et al. Style normalization and restitution for generalizable person re-identification[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 3140–3149.
[28]	YU Fufu, JIANG Xinyang, GONG Yifei, et al. Devil's in the details: Aligning visual clues for conditional embedding in person re-identification[EB/OL]. https://doi.org/10.48550/arXiv.2009.05250, 2020.
[29]	ZHU Kuan, GUO Haiyun, LIU Zhiwei, et al. Identity-guided human semantic parsing for person re-identification[C]. The 16th European Conference on Computer Vision, Glasgow, UK, 2020: 346–363.
[30]	LI Hanjun, WU Gaojie, and ZHENG Weishi. Combined depth space based architecture search for person re-identification[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 6725–6734.
[31]	CHEN Jiaxing, JIANG Xinyang, WANG Fudong, et al. Learning 3D shape feature for texture-insensitive person re-identification[C]. The 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 8142–8151.
[32]	LI Yulin, HE Jianfeng, ZHANG Tianzhu, et al. Diverse part discovery: Occluded person re-identification with part-aware transformer[C]. The 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 2897–2906.
[33]	ZHOU Kaiyang and XIANG Tao. Torchreid: A library for deep learning person re-identification in pytorch[EB/OL]. https://doi.org/10.48550/arXiv.1910.10093, 2019.