高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

结合视觉文本匹配和图嵌入的可见光-红外行人重识别

张红颖 樊世钰 罗谦 张涛

张红颖, 樊世钰, 罗谦, 张涛. 结合视觉文本匹配和图嵌入的可见光-红外行人重识别[J]. 电子与信息学报. doi: 10.11999/JEIT240318
引用本文: 张红颖, 樊世钰, 罗谦, 张涛. 结合视觉文本匹配和图嵌入的可见光-红外行人重识别[J]. 电子与信息学报. doi: 10.11999/JEIT240318
ZHANG Hongying, FAN Shiyu, LUO Qian, ZHANG Tao. Combining Visual-Textual Matching and Graph Embedding for Visible-Infrared Person Re-identification[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240318
Citation: ZHANG Hongying, FAN Shiyu, LUO Qian, ZHANG Tao. Combining Visual-Textual Matching and Graph Embedding for Visible-Infrared Person Re-identification[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240318

结合视觉文本匹配和图嵌入的可见光-红外行人重识别

doi: 10.11999/JEIT240318
基金项目: 国家自然科学基金民航联合研究基金重点支持项目(U2133211),中国民航大学研究生科研创新资助项目(2023YJSKC05005)
详细信息
    作者简介:

    张红颖:女,博士,教授,硕士生导师,研究方向为图像工程与计算机视觉

    樊世钰:男,硕士生,研究方向为计算机视觉、行人重识别

    罗谦:男,研究员,研究方向为民航大数据挖掘算法研究、民航大数据平台仿真建模

    张涛:男,高级工程师,研究方向为智慧机场运行技术

    通讯作者:

    张红颖 carole_zhang0716@163.com

  • 中图分类号: TN911.73; TP391.41

Combining Visual-Textual Matching and Graph Embedding for Visible-Infrared Person Re-identification

Funds: Key Supported Project of the Civil Aviation Joint Research Fund of the National Natural Science Foundation of China (U2133211), Graduate Research Innovation Grant Program of Civil Aviation University of China (2023YJSKC05005)
  • 摘要: 对于可见光-红外跨模态行人重识别(Re-ID),大多数方法采用基于模态转换的策略,通过对抗网络生成图像,以此建立不同模态间的相互联系。然而这些方法往往不能有效降低模态间的差距,导致重识别性能不佳。针对此问题,该文提出一种基于视觉文本匹配和图嵌入的双阶段跨模态行人重识别方法。该方法通过上下文优化方案构建可学习文本模板,生成行人描述作为模态间的关联信息。具体而言,在第1阶段基于图片-文本对的预训练(CLIP)模型实现同一行人不同模态间的统一文本描述作为先验信息辅助降低模态差异。同时在第2阶段引入基于图嵌入的跨模态约束框架,设计模态间自适应损失函数,提升行人识别准确率。为了验证所提方法的有效性,在SYSU-MM01和RegDB数据集上进行了大量实验,其中SYSU-MM01数据集上的首次命中(Rank-1)和平均精度均值(mAP)分别达到64.2%, 60.2%。实验结果表明,该文所提方法能够提升可见光-红外跨模态行人重识别的准确率。
  • 图  1  第1阶段处理流程图

    图  2  第2阶段处理流程图

    图  3  基于CoOp的文本提示优化示意图

    图  4  模态间特征奖励与惩罚示意图

    图  5  识别的可视化结果

    图  6  不同编码器的可视化结果

    表  1  在SYSU-MM01的All-search模式下和其他方法对比实验结果(%)

    方法 单镜头 多镜头
    Rank-1 Rank-10 Rank-20 mAP Rank-1 Rank-10 Rank-20 mAP
    Zero-padding[13] 14.8 54.1 71.3 16.0 61.4 78.4 10.9
    HCML[15] 14.3 53.2 69.2 16.2
    BDTR[16] 27.3 67.0 81.7 27.3
    eBDTR[17] 27.8 67.3 81.3 28.4
    Hi-CMD[4] 34.9 77.6 35.9
    DPMBN[18] 37.0 79.5 89.9 40.3
    LZM[19] 45.0 89.0 45.9
    AlignGAN[8] 42.4 85.0 93.7 40.7 51.5 89.4 95.7 33.9
    Xmodal[20] 49.9 89.8 96.0 50.7 47.6 88.1 96.0 36.1
    DDAG[21] 54.8 90.4 95.8 53.0
    SFANET[22] 60.5 91.8 95.2 53.9
    MID[23] 60.3 92.9 96.7 59.4
    CM-NAS[24] 60.8 92.1 96.8 58.9 68.0 94.8 97.9 52.4
    Baseline(AGW)[12] 47.5 84.4 92.1 47.7
    本文方法 64.2 92.5 96.1 60.2 71.0 90.0 94.0 52.4
    下载: 导出CSV

    表  2  在RegDB数据集和其他方法对比实验结果(%)

    方法可见光图像查询红外图像红外图像查询可见光图像
    Rank-1Rank-10Rank-20mAPRank-1Rank-10Rank-20mAP
    Zero-padding[13]17.834.244.418.916.634.744.317.8
    HCML[15]24.447.556.820.021.745.055.622.2
    BDTR[16]33.658.667.432.832.958.568.432.0
    eBDTR[17]34.659.068.733.4634.258.768.632.5
    AlignGAN[8]57.953.656.353.4
    Xmodal[20]62.283.191.760.2
    DDAG[21]69.386.291.563.568.185.290.361.8
    SFANET[22]76.391.094.368.070.285.289.263.8
    Baseline(AGW)[12]70.186.266.470.587.165.9
    本文方法73.088.194.467.772.887.190.166.2
    下载: 导出CSV

    表  3  在SYSU-MM01的All-search模式单镜头设置下实验结果(%)

    方法 Rank-1 Rank-5 Rank-10 mAP
    Baseline 47.5 86.2 47.7
    Baseline+CLIP 60.2 80.2 87.8 56.1
    Baseline+CLIP+MAGE 62.9 82.7 90.1 58.6
    下载: 导出CSV

    表  4  损失函数的选择对实验指标的影响(%)

    $ {L_{{\text{i2t}}}} $$ {L_{{\text{id}}}} $$ {L_{{\text{tri}}}} $$ {L_{{\text{MAGE}}}} $Rank-1Rank-5Rank-10mAP
    51.675.88342.9
    60.583.490.656.7
    4.412.322.25.3
    62.885.391.759.2
    64.286.592.560.2
    下载: 导出CSV

    表  5  不同图像编码器下采用单阶段和双阶段对实验指标的影响(%)

    Rank-1Rank-5Rank-10mAP
    AGW[12]单阶段
    双阶段
    61.3
    64.2
    84.5
    86.5
    91.2
    92.5
    57.6
    60.2
    ViT-B/16[25]单阶段
    双阶段
    62.5
    63.0
    79.7
    81.4
    88.7
    91.3
    58.3
    60.6
    下载: 导出CSV

    表  6  参数$ \alpha $在不同取值下的实验结果(%)

    $ \alpha $SYSU-MM01
    Rank-1Rank-5Rank-10mAP
    0.556.080.488.654.8
    0.0562.383.189.658.5
    0.00562.987.290.358.6
    0.000559.683.190.156.9
    下载: 导出CSV
  • [1] 张永飞, 杨航远, 张雨佳, 等. 行人再识别技术研究进展[J]. 中国图象图形学报, 2023, 28(6): 1829–1862. doi: 10.11834/jig.230022.

    ZHANG Yongfei, YANG Hangyuan, ZHANG Yujia, et al. Recent progress in person re-ID[J]. Journal of Image and Graphics, 2023, 28(6): 1829–1862. doi: 10.11834/jig.230022.
    [2] 王粉花, 赵波, 黄超, 等. 基于多尺度和注意力融合学习的行人重识别[J]. 电子与信息学报, 2020, 42(12): 3045–3052. doi: 10.11999/JEIT190998.

    WANG Fenhua, ZHAO Bo, HUANG Chao, et al. Person re-identification based on multi-scale network attention fusion[J]. Journal of Electronics & Information Technology, 2020, 42(12): 3045–3052. doi: 10.11999/JEIT190998.
    [3] LI Shuang, LI Fan, LI Jinxing, et al. Logical relation inference and multiview information interaction for domain adaptation person re-identification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023. doi: 10.1109/tnnls.2023.3281504. (查阅网上资料,未找到本条文献卷期页码信息,请确认) .
    [4] CHOI S, LEE S, KIM Y, et al. Hi-CMD: Hierarchical cross-modality disentanglement for visible-infrared person re-identification[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 10254–10263. doi: 10.1109/cvpr42600.2020.01027.
    [5] HUANG Nianchang, LIU Jianan, LUO Yongjiang, et al. Exploring modality-shared appearance features and modality-invariant relation features for cross-modality person re-identification[J]. Pattern Recognition, 2023, 135: 109145. doi: 10.1016/j.patcog.2022.109145.
    [6] ZHANG Yukang and WANG Hanzi. Diverse embedding expansion network and low-light cross-modality benchmark for visible-infrared person re-identification[C]. Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 2153–2162. doi: 10.1109/CVPR52729.2023.00214.
    [7] DAI Pingyang, JI Rongrong, WANG Haibin, et al. Cross-modality person re-identification with generative adversarial training[C]. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018: 677–683.
    [8] WANG Guan’an, ZHANG Tianzhu, CHENG Jian, et al. RGB-infrared cross-modality person re-identification via joint pixel and feature alignment[C]. The 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 3622–3631. doi: 10.1109/ICCV.2019.00372.
    [9] 寇旗旗, 黄绩, 程德强, 等. 基于语义融合的域内相似性分组行人重识别[J]. 通信学报, 2022, 43(7): 153–162. doi: 10.11959/j.issn.1000-436x.2022136.

    KOU Qiqi, HUANG Ji, CHENG Deqiang, et al. Person re-identification with intra-domain similarity grouping based on semantic fusion[J]. Journal on Communications, 2022, 43(7): 153–162. doi: 10.11959/j.issn.1000-436x.2022136.
    [10] LI Siyuan, SUN Li, and LI Qingli. CLIP-ReID: Exploiting vision-language model for image re-identification without concrete text labels[C]. Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, USA, 2023: 1405–1413. doi: 10.1609/aaai.v37i1.25225.
    [11] MORSING L H, SHEIKH-OMAR O A, and IOSIFIDIS A. Supervised domain adaptation using graph embedding[C]. 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 2021: 7841–7847. doi: 10.1109/icpr48806.2021.9412422.
    [12] YE Mang, SHEN Jianbing, LIN Gaojie, et al. Deep learning for person re-identification: A survey and outlook[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(6): 2872–2893. doi: 10.1109/TPAMI.2021.3054775.
    [13] WU Ancong, ZHENG Weishi, YU Hongxing, et al. RGB-infrared cross-modality person re-identification[C]. Proceedings of the 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 5390–5399. doi: 10.1109/iccv.2017.575.
    [14] NGUYEN D T, HONG H G, KIM K W, et al. Person recognition system based on a combination of body images from visible light and thermal cameras[J]. Sensors, 2017, 17(3): 605. doi: 10.3390/s17030605.
    [15] YE Mang, LAN Xiangyuan, LI Jiawei, et al. Hierarchical discriminative learning for visible thermal person re-identification[C]. Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, USA, 2018: 7501–7508. doi: 10.1609/aaai.v32i1.12293.
    [16] YE Mang, WANG Zheng, LAN Xiangyuan, et al. Visible thermal person re-identification via dual-constrained top-ranking[C]. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018: 1092–1099.
    [17] YE Mang, LAN Xiangyuan, WANG Zheng, et al. Bi-directional center-constrained top-ranking for visible thermal person re-identification[J]. IEEE Transactions on Information Forensics and Security, 2020, 15: 407–419. doi: 10.1109/tifs.2019.2921454.
    [18] XIANG Xuezhi, LV Ning, YU Zeting, et al. Cross-modality person re-identification based on dual-path multi-branch network[J]. IEEE Sensors Journal, 2019, 19(23): 11706–11713. doi: 10.1109/JSEN.2019.2936916.
    [19] BASARAN E, GÖKMEN M, and KAMASAK M E. An efficient framework for visible–infrared cross modality person re-identification[J]. Signal Processing: Image Communication, 2020, 87: 115933. doi: 10.1016/j.image.2020.115933.
    [20] LI Diangang, WEI Xing, HONG Xiaopeng, et al. Infrared-visible cross-modal person re-identification with an x modality[C]. Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, 2020: 4610–4617. doi: 10.1609/aaai.v34i04.5891.
    [21] YE Mang, SHEN Jianbing, CRANDALL D J, et al. Dynamic dual-attentive aggregation learning for visible-infrared person re-identification[C]. Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 2020: 229–247. doi: 10.1007/978-3-030-58520-4_14.
    [22] LIU Haojie, MA Shun, XIA Daoxun, et al. SFANet: A spectrum-aware feature augmentation network for visible-infrared person reidentification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(4): 1958–1971. doi: 10.1109/tnnls.2021.3105702.
    [23] HUANG Zhipeng, LIU Jiawei, LI Liang, et al. Modality-adaptive mixup and invariant decomposition for RGB-infrared person re-identification[C]. Proceedings of the 36th AAAI Conference on Artificial Intelligence, 2022: 1034–1042. doi: 10.1609/aaai.v36i1.19987. (查阅网上资料,未找到本条文献出版地信息,请确认) .
    [24] FU Chaoyou, HU Yibo, WU Xiang, et al. CM-NAS: Cross-modality neural architecture search for visible-infrared person re-identification[C]. Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 11803–11812. doi: 10.1109/ICCV48922.2021.01161.
    [25] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C]. 9th International Conference on Learning Representations, 2021. (查阅网上资料, 未找到本条文献出版地信息, 请确认) .
  • 加载中
图(6) / 表(6)
计量
  • 文章访问数:  26
  • HTML全文浏览量:  11
  • PDF下载量:  1
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-04-22
  • 修回日期:  2024-06-24
  • 网络出版日期:  2024-06-27

目录

    /

    返回文章
    返回