高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种结构化双注意力混合通道增强的跨模态行人重识别方法

庄建军 庄宇辰

庄建军, 庄宇辰. 一种结构化双注意力混合通道增强的跨模态行人重识别方法[J]. 电子与信息学报, 2024, 46(2): 518-526. doi: 10.11999/JEIT230614
引用本文: 庄建军, 庄宇辰. 一种结构化双注意力混合通道增强的跨模态行人重识别方法[J]. 电子与信息学报, 2024, 46(2): 518-526. doi: 10.11999/JEIT230614
ZHUANG Jianjun, ZHUANG Yuchen. A Cross-modal Person Re-identification Method Based on Hybrid Channel Augmentation with Structured Dual Attention[J]. Journal of Electronics & Information Technology, 2024, 46(2): 518-526. doi: 10.11999/JEIT230614
Citation: ZHUANG Jianjun, ZHUANG Yuchen. A Cross-modal Person Re-identification Method Based on Hybrid Channel Augmentation with Structured Dual Attention[J]. Journal of Electronics & Information Technology, 2024, 46(2): 518-526. doi: 10.11999/JEIT230614

一种结构化双注意力混合通道增强的跨模态行人重识别方法

doi: 10.11999/JEIT230614
基金项目: 国家重点研发计划(2021YFE0105500),国家自然科学基金(62171228 ),江苏高校“青蓝工程”
详细信息
    作者简介:

    庄建军:男,教授,研究方向为视频信号的智能处理

    庄宇辰:男,硕士生,研究方向为计算机视觉、行人重识别

    通讯作者:

    庄建军 jjzhuang@nuist.edu.cn

  • 中图分类号: TN911.73; TP391.4

A Cross-modal Person Re-identification Method Based on Hybrid Channel Augmentation with Structured Dual Attention

Funds: The National Key Research and Development Program (2021YFE0105500), The National Natural Science Foundation of China (62171228), Jiangsu Qinglan Project
  • 摘要: 在目前跨模态行人重识别技术的研究中,大部分现有的方法会通过单模态原始可见光图像或者对抗生成图像的局部共享特征来降低跨模态差异,导致在红外图像判别中由于底层特征信息丢失而缺乏稳定的识别准确率。为了解决该问题,该文提出一种结构化双注意力可交换混合随机通道增强的特征融合跨模态行人重识别方法,利用通道增强后的可视图像作为第三模态,通过图像通道可交换随机混合增强(I-CSA)模块对可见光图像进行单通道和三通道随机混合增强抽取,从而突出行人的姿态结构细节,在学习中减少模态间差异。结构化联合注意力特征融合 (SAFF)模块在注重模态间行人姿态结构关系的前提下,为跨模态表征学习提供更丰富的监督,增强了模态变化中共享特征的鲁棒性。在SYSU-MM01数据集全搜索模式单摄设置下Rank-1和mAP分别达到71.2%和68.1%,优于同类前沿方法。
  • 图  1  本文模型总结构框架

    图  2  RGB-IR及RHCA模块强化实例演示图

    图  3  随机擦除数据增强实例演示图

    图  4  SAFF模块结构图

    图  5  SAFF模块单支FAA结构图

    图  6  不同stage下SAFF 模块嵌入效果对比

    表  1  SYSU-MM01数据集在单摄设置下的实验对比结果(%)

    方法全搜索室内搜索
    单摄单摄
    MethodRank-1Rank-10Rank-20mAPRank-1Rank-10Rank-20mAP
    AlignGAN[5]42.485.093.740.735.987.694.454.3
    AGW[20]47.584.492.147.754.291.196.063.0
    DDAG[12]54.890.495.853.061.094.198.468.0
    MID[21]60.392.959.464.996.170.1
    SFANET[19]60.591.895.253.964.894.798.175.2
    cm-SSFT[6]61.689.293.963.270.594.997.772.6
    SPOT[22]65.392.797.062.369.496.299.174.6
    MCLNet[26]65.493.397.162.072.697.099.276.6
    FMCNet[23]66.362.568.274.1
    本文方法71.296.398.968.177.498.099.681.1
    下载: 导出CSV

    表  2  RegDB数据集上的实验对比结果(%)

    方法可见光图像查询红外图像红外图像查询可见光图像
    Rank-1Rank-10Rank-20mAPRank-1Rank-10Rank-20mAP
    MAC[24]36.462.471.637.0
    AlignGAN[5]57.953.656.353.4
    DDAG[12]69.386.291.563.568.185.290.361.8
    LbA[25]74.267.672.465.5
    MCLNet[26]80.392.796.073.175.990.994.669.5
    DCLNet[27]81.274.378.070.6
    本文方法86.397.298.779.885.597.098.378.1
    下载: 导出CSV

    表  3  SYSU-MM01数据集上的消融实验(%)

    RHCA TRE-DA SAFF EJ-Loss SYSU-MM01
    Rank-1 mAP
    48.8 46.6
    57.4 55.1
    60.6 57.3
    64.2 60.9
    68.2 64.3
    71.2 68.1
    下载: 导出CSV

    表  4  SAFF模块嵌入位置研究结果(%)

    SAFF SYSU-MM01 RegDB
    Rank-1 mAP Rank-1 mAP
    stage0 70.7 67.8 86.2 79.7
    stage1 71.1 67.9 86.3 79.8
    stage2 71.2 68.1 86.0 79.3
    stage3 66.3 63.5 83.9 72.1
    stage4 59.4 55.1 81.4 68.0
    下载: 导出CSV

    表  5  $ \gamma $参数不同取值下的训练实验结果(%)

    $ \gamma $SYSU-MM01
    Rank-1Rank-10Rank-20mAP
    066.794.697.164.7
    0.168.495.297.865.9
    0.369.995.898.567.3
    0.570.896.298.767.9
    0.759.391.195.457.7
    0.955.489.094.253.1
    0.4771.296.398.968.1
    下载: 导出CSV
  • [1] HUANG Yukun, FU Xueyang, LI Liang, et al. Learning degradation-invariant representation for robust real-world person Re-identification[J]. International Journal of Computer Vision, 2022, 130(11): 2770–2796. doi: 10.1007/s11263-022-01666-w.
    [2] YANG Lei. Continuous epoch distance integration for unsupervised person Re-identification[C]. The 5th International Conference on Communications, Information System and Computer Engineering, Guangzhou, China, 2023: 464–469. doi: 10.1109/cisce58541.2023.10142496.
    [3] XUAN Shiyu and ZHANG Shiliang. Intra-inter domain similarity for unsupervised person Re-identification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022: 1. doi: 10.1109/tpami.2022.3163451.
    [4] DAI Pingyang, JI Rongrong, WANG Haibin, et al. Cross-modality person Re-identification with generative adversarial training[C]. Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018: 677–683. doi: 10.24963/ijcai.2018/94.
    [5] WANG Guan’an, ZHANG Tianzhu, CHENG Jian, et al. RGB-infrared cross-modality person Re-identification via joint pixel and feature alignment[C]. The IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 3622–3631. doi: 10.1109/ICCV.2019.00372.
    [6] LU Yan, WU Yue, LIU Bin, et al. Cross-modality person Re-identification with shared-specific feature transfer[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 13376–13386. doi: 10.1109/CVPR42600.2020.01339.
    [7] LI Xulin, LU Yan, LIU Bin, et al. Counterfactual intervention feature transfer for visible-infrared person Re-identification[C]. 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 381–398. doi: 10.1007/978-3-031-19809-0_22.
    [8] 王凤随, 闫涛, 刘芙蓉, 等. 融合子空间共享特征的多尺度跨模态行人重识别方法[J]. 电子与信息学报, 2023, 45(1): 325–334. doi: 10.11999/JEIT211212.

    WANG Fengsui, YAN Tao, LIU Furong, et al. Multi-scale cross-modality person Re-identification method based on shared subspace features[J]. Journal of Electronics & Information Technology, 2023, 45(1): 325–334. doi: 10.11999/JEIT211212.
    [9] LIANG Tengfei, JIN Yi, LIU Wu, et al. Cross-modality transformer with modality mining for visible-infrared person Re-identification[J]. IEEE Transactions on Multimedia, 2023: 1–13. doi: 10.1109/tmm.2023.3237155.
    [10] 徐胜军, 刘求缘, 史亚, 等. 基于多样化局部注意力网络的行人重识别[J]. 电子与信息学报, 2022, 44(1): 211–220. doi: 10.11999/ JEIT201003.

    XU Shengjun, LIU Qiuyuan, SHI Ya, et al. Person Re-identification based on diversified local attention network[J]. Journal of Electronics & Information Technology, 2022, 44(1): 211–220. doi: 10.11999/JEIT201003.
    [11] JIA Mengxi, SUN Yifan, ZHAI Yunpeng, et al. Semi-attention partition for occluded person Re-identification[C]. The 37th AAAI Conference on Artificial Intelligence, Washington, USA, 2023: 998–1006. doi: 10.1609/aaai.v37i1.25180.
    [12] YE Mang, SHEN Jianbing, CRANDALL D J, et al. Dynamic dual-attentive aggregation learning for visible-infrared person Re-identification[C]. 16th European Conference on Computer Vision, Glasgow, UK, 2020: 229–247. doi: 10.1007/978-3-030-58520-4_14.
    [13] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
    [14] WANG Qilong, WU Banggu, ZHU Pengfei, et al. ECA-Net: Efficient channel attention for deep convolutional neural networks[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 11531–11539. doi: 10.1109/CVPR42600.2020.01155.
    [15] WU Ancong, ZHENG Weishi, YU Hongxing, et al. RGB-infrared cross-modality person Re-identification[C]. The IEEE International Conference on Computer Vision, Venice, Italy, 2017: 5390–5399. doi: 10.1109/ICCV.2017.575.
    [16] NGUYEN D T, HONG H G, KIM K W, et al. Person recognition system based on a combination of body images from visible light and thermal cameras[J]. Sensors, 2017, 17(3): 605. doi: 10.3390/s17030605.
    [17] KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84–90. doi: 10.1145/3065386.
    [18] SONG Shuang, CHAUDHURI K, and SARWATE A D. Stochastic gradient descent with differentially private updates[C]. Global Conference on Signal & Information Processing, Austin, USA, 2014: 245–248. doi: 10.1109/globalsip.2013.6736861.
    [19] LIU Haojie, MA Shun, XIA Daoxun, et al. SFANet: A spectrum-aware feature augmentation network for visible-infrared person reidentification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(4): 1958–1971. doi: 10.1109/tnnls.2021.3105702.
    [20] YE Mang, SHEN Jianbing, LIN Gaojie, et al. Deep learning for person Re-identification: A survey and outlook[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 2872–2893. doi: 10.1109/TPAMI.2021.3054775.
    [21] HUANG Zhipeng, LIU Jiawei, LI Liang, et al. Modality-adaptive mixup and invariant decomposition for RGB-infrared person Re-identification[C/OL]. The 36th AAAI Conference on Artificial Intelligence, 2022: 1034–1042. doi: 10.1609/aaai.v36i1.19987.
    [22] CHEN Cuiqun, YE Mang, QI Meibin, et al. Structure-aware positional transformer for visible-infrared person Re-identification[J]. IEEE Transactions on Image Processing, 2022, 31: 2352–2364. doi: 10.1109/tip.2022.3141868.
    [23] ZHANG Qiang, LAI Changzhou, LIU Jianan, et al. FMCNet: Feature-level modality compensation for visible-infrared person Re-identification[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 7339–7348. doi: 10.1109/cvpr52688.2022.00720.
    [24] YE Mang, LAN Xiangyuan, and LENG Qingming. Modality-aware collaborative learning for visible thermal person Re-identification[C]. The 27th ACM International Conference on Multimedia, Nice, France, 2019: 347–355. doi: 10.1145/3343031.3351043.
    [25] PARK H, LEE S, LEE J, et al. Learning by aligning: Visible-infrared person Re-identification using cross-modal correspondences[C]. The IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 12026–12035. doi: 10.1109/iccv48922.2021.01183.
    [26] HAO Xin, ZHAO Sanyuan, YE Mang, et al. Cross-modality person Re-identification via modality confusion and center aggregation[C]. The IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 16383–16392. doi: 10.1109/ICCV48922.2021.01609.
    [27] SUN Hanzhe, LIU Jun, ZHANG Zhizhong, et al. Not all pixels are matched: Dense contrastive learning for cross-modality person Re-identification[C]. The 30th ACM International Conference on Multimedia, Lisbon, Portugal, 2022: 5333–5341. doi: 10.1145/3503161.3547970.
  • 加载中
图(6) / 表(5)
计量
  • 文章访问数:  610
  • HTML全文浏览量:  173
  • PDF下载量:  95
  • 被引次数: 0
出版历程
  • 收稿日期:  2023-06-21
  • 修回日期:  2023-11-03
  • 录用日期:  2023-11-14
  • 网络出版日期:  2023-11-17
  • 刊出日期:  2024-02-29

目录

    /

    返回文章
    返回