高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于CNN和TransFormer多尺度学习行人重识别方法

陈莹 匡澄

陈莹, 匡澄. 基于CNN和TransFormer多尺度学习行人重识别方法[J]. 电子与信息学报, 2023, 45(6): 2256-2263. doi: 10.11999/JEIT220601
引用本文: 陈莹, 匡澄. 基于CNN和TransFormer多尺度学习行人重识别方法[J]. 电子与信息学报, 2023, 45(6): 2256-2263. doi: 10.11999/JEIT220601
CHEN Ying, KUANG Cheng. Pedestrian Re-Identification Based on CNN and TransFormer Multi-scale Learning[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2256-2263. doi: 10.11999/JEIT220601
Citation: CHEN Ying, KUANG Cheng. Pedestrian Re-Identification Based on CNN and TransFormer Multi-scale Learning[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2256-2263. doi: 10.11999/JEIT220601

基于CNN和TransFormer多尺度学习行人重识别方法

doi: 10.11999/JEIT220601
基金项目: 国家自然科学基金(62173160)
详细信息
    作者简介:

    陈莹:女,教授,博士生导师,研究方向为图像处理、信息融合、模式识别

    匡澄:男,硕士,研究方向为行人重识别

    通讯作者:

    陈莹 chenying@ jiangnan.edu.cn

  • 中图分类号: TN911.73; TP273

Pedestrian Re-Identification Based on CNN and TransFormer Multi-scale Learning

Funds: The National Natural Science Foundation of China (62173160)
  • 摘要: 行人重识别(ReID)旨在跨监控摄像头下检索出特定的行人目标。为聚合行人图像的多粒度特征并进一步解决深层特征映射相关性的问题,该文提出基于CNN和TransFormer多尺度学习行人重识别方法(CTM)进行端对端的学习。CTM网络由全局分支、深度聚合分支和特征金字塔分支组成,其中全局分支提取行人图像全局特征,提取具有不同尺度的层次特征;深度聚合分支循环聚合CNN的层次特征,提取多尺度特征;特征金字塔分支是一个双向的金字塔结构,在注意力模块和正交正则化操作下,能够显著提高网络的性能。大量实验结果表明了该文方法的有效性,在Market1501, DukeMTMC-reID和MSMT17数据集上,mAP/Rank-1分别达到了90.2%/96.0%, 82.3%/91.6%和63.2%/83.7%,优于其他现有方法。
  • 图  1  本文的网络结构

    图  2  TFC模块

    图  3  Market1501数据集可视化结果

    表  1  不同方法在公开数据集上的性能比较(%)

    方法出处Market-1501DukeMTMC-reIDMSMT17
    mAPRank-1mAPRank-1mAPRank-1
    MHN[8]CVPR201985.095.177.289.1
    SONA[25]ICCV201988.695.678.189.3
    OSNet[5]ICCV201984.994.873.588.652.978.7
    HOReID[26]CVPR202084.994.275.686.9
    SNR[27]CVPR202084.794.472.984.4
    CACE-Net[28]CVPR202090.396.081.390.162.083.5
    ISP[29]ECCV202088.695.380.089.6
    CDNet[30]CVPR202186.095.176.888.654.778.9
    HAT[16]MM202189.895.881.490.461.282.3
    L3DS[31]CVPR202187.395.076.188.2
    PAT[32]CVPR202188.085.478.288.8
    本文90.296.082.391.663.283.7
    下载: 导出CSV

    表  2  不同分支对实验结果的影响(DukeMTMC-reID)(%)

    BranchmAPRank-1Rank-5Rank-10
    HAT(Baseline)81.490.495.697.1
    +DSAB82.191.095.796.8
    +FPB82.091.295.797.2
    CTM82.391.695.997.5
    注:+表示网络中仅使用该分支
    下载: 导出CSV

    表  3  分块策略对比实验(DukeMTMC-reID)(%)

    Part-LevelmAPRank-1Rank-5Rank-10
    +2 part-level81.990.895.396.8
    +3 part-level82.391.695.997.5
    +4 part-level82.391.295.497.0
    +5 part-level81.890.295.697.2
    下载: 导出CSV

    表  4  正交正则化对实验的影响(DukeMTMC-reID)(%)

    MethodmAPRank-1Rank-5Rank-10
    HAT(Baseline)81.490.495.697.1
    –SOR, COR82.091.295.896.9
    +COR82.191.495.697.2
    +SOR82.391.595.697.1
    CTM82.391.695.997.5
    (注:+表示网络中仅使用该操作,-表示均不使用操作)
    下载: 导出CSV

    表  5  注意力模块对实验结果的影响(DukeMTMC-reID)(%)

    AttentionmAPRank-1Rank-5Rank-10
    HAT(Baseline)81.490.495.697.1
    +RGA-C82.191.295.896.8
    +RGA-S82.091.495.697.4
    +Attention on backone81.991.495.496.9
    +Attention on FPB82.291.395.997.2
    CTM82.391.695.997.5
    下载: 导出CSV
  • [1] 邹国锋, 傅桂霞, 高明亮, 等. 行人重识别中度量学习方法研究进展[J]. 控制与决策, 2021, 36(7): 1547–1557. doi: 10.13195/j.kzyjc.2020.0801

    ZOU Guofeng, FU Guixia, GAO Mingliang, et al. A survey on metric learning in person re-identification[J]. Control and Decision, 2021, 36(7): 1547–1557. doi: 10.13195/j.kzyjc.2020.0801
    [2] 贲晛烨, 徐森, 王科俊. 行人步态的特征表达及识别综述[J]. 模式识别与人工智能, 2012, 25(1): 71–81. doi: 10.16451/j.cnki.issn1003-6059.2012.01.010

    BEN Xianye, XU Sen, and WANG Kejun. Review on pedestrian gait feature expression and recognition[J]. Pattern Recognition and Artificial Intelligence, 2012, 25(1): 71–81. doi: 10.16451/j.cnki.issn1003-6059.2012.01.010
    [3] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.
    [4] SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]. The 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1–9.
    [5] ZHOU Kaiyang, YANG Yongxin, CAVALLARO A, et al. Omni-scale feature learning for person re-identification[C]. The 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 3701–3711.
    [6] WIECZOREK M, RYCHALSKA B, and DABROWSKI J. On the unreasonable effectiveness of centroids in image retrieval[C]. The 28th International Conference on Neural Information Processing, Sanur, Indonesia, 2021: 212–223.
    [7] 匡澄, 陈莹. 基于多粒度特征融合网络的行人重识别[J]. 电子学报, 2021, 49(8): 1541–1550. doi: 10.12263/DZXB.20200974

    KUANG Cheng and CHEN Ying. Multi-granularity feature fusion network for person re-identification[J]. Acta Electronica Sinica, 2021, 49(8): 1541–1550. doi: 10.12263/DZXB.20200974
    [8] CHEN Binghui, DENG Weihong, and HU Jiani. Mixed high-order attention network for person re-identification[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 371–381.
    [9] CHEN Xuesong, FU Canmiao, ZHAO Yong, et al. Salience-guided cascaded suppression network for person re-identification[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 3297–3307.
    [10] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6000–6010.
    [11] HAN Kai, WANG Yunhe, CHEN Hanting, et al. A survey on visual transformer[EB/OL]. https://doi.org/10.48550/arXiv.2012.12556, 2012.
    [12] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv: 2010.11929, 2020.
    [13] HE Shuting, LUO Hao, WANG Pichao, et al. TransReID: Transformer-based object re-identification[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 14993–15002.
    [14] PENG Zhiliang, HUANG Wei, GU Shanzhi, et al. Conformer: Local features coupling global representations for visual recognition[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 357–366.
    [15] WANG Wenhai, XIE Enze, LI Xiang, et al. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021.
    [16] ZHANG Guowen, ZHANG Pingping, QI Jinqing, et al. HAT: Hierarchical aggregation transformers for person re-identification[C]. The 29th ACM International Conference on Multimedia, Chengdu, China, 2021: 516–525.
    [17] ZHANG Suofei, YIN Zirui, WU X, et al. FPB: Feature pyramid branch for person re-identification[EB/OL]. https://doi.org/10.48550/arXiv.2108.01901, 2021.
    [18] ZHENG Liang, SHEN Liyue, TIAN Lu, et al. Scalable person re-identification: A benchmark[C]. The 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 2015: 1116–1124.
    [19] RISTANI E, SOLERA F, ZOU R, et al. Performance measures and a data set for multi-target, multi-camera tracking[C]. The European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 17–35.
    [20] WEI Longhui, ZHANG Shiliang, GAO Wen, et al. Person transfer GAN to bridge domain gap for person re-identification[C]. The 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 79–88.
    [21] ZHANG Zhizheng, LAN Cuiling, ZENG Wenjun, et al. Relation-aware global attention for person re-identification[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 3183–3192.
    [22] CHEN Tianlong, DING Shaojin, XIE Jingyi, et al. ABD-net: Attentive but diverse person re-identification[C]. The 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 8350–8360.
    [23] HERMANS A, BEYER L, and LEIBE B. In defense of the triplet loss for person re-identification[EB/OL]. https://doi.org/10.48550/arXiv.1809.05864, 2017.
    [24] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 2818–2826.
    [25] BRYAN B, GONG Yuan, ZHANG Yizhe, et al. Second-order non-local attention networks for person re-identification[C]. The 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 3759–3768.
    [26] WANG Guan'an, YANG Shuo, LIU Huanyu, et al. High-order information matters: Learning relation and topology for occluded person re-identification[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 6448–6457.
    [27] JIN Xin, LAN Cuiling, ZENG Wenjun, et al. Style normalization and restitution for generalizable person re-identification[C]. The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 3140–3149.
    [28] YU Fufu, JIANG Xinyang, GONG Yifei, et al. Devil's in the details: Aligning visual clues for conditional embedding in person re-identification[EB/OL]. https://doi.org/10.48550/arXiv.2009.05250, 2020.
    [29] ZHU Kuan, GUO Haiyun, LIU Zhiwei, et al. Identity-guided human semantic parsing for person re-identification[C]. The 16th European Conference on Computer Vision, Glasgow, UK, 2020: 346–363.
    [30] LI Hanjun, WU Gaojie, and ZHENG Weishi. Combined depth space based architecture search for person re-identification[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 6725–6734.
    [31] CHEN Jiaxing, JIANG Xinyang, WANG Fudong, et al. Learning 3D shape feature for texture-insensitive person re-identification[C]. The 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 8142–8151.
    [32] LI Yulin, HE Jianfeng, ZHANG Tianzhu, et al. Diverse part discovery: Occluded person re-identification with part-aware transformer[C]. The 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 2897–2906.
    [33] ZHOU Kaiyang and XIANG Tao. Torchreid: A library for deep learning person re-identification in pytorch[EB/OL]. https://doi.org/10.48550/arXiv.1910.10093, 2019.
  • 加载中
图(3) / 表(5)
计量
  • 文章访问数:  966
  • HTML全文浏览量:  1433
  • PDF下载量:  230
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-05-12
  • 修回日期:  2022-11-11
  • 网络出版日期:  2022-11-19
  • 刊出日期:  2023-06-10

目录

    /

    返回文章
    返回