高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

用于跨模态舰船图像检索的判别性对抗哈希变换器

关欣 国佳恩 卢雨

关欣, 国佳恩, 卢雨. 用于跨模态舰船图像检索的判别性对抗哈希变换器[J]. 电子与信息学报, 2023, 45(12): 4411-4420. doi: 10.11999/JEIT220980
引用本文: 关欣, 国佳恩, 卢雨. 用于跨模态舰船图像检索的判别性对抗哈希变换器[J]. 电子与信息学报, 2023, 45(12): 4411-4420. doi: 10.11999/JEIT220980
GUAN Xin, GUO Jiaen, LU Yu. Discriminant Adversarial Hashing Transformer for Cross-modal Vessel Image Retrieval[J]. Journal of Electronics & Information Technology, 2023, 45(12): 4411-4420. doi: 10.11999/JEIT220980
Citation: GUAN Xin, GUO Jiaen, LU Yu. Discriminant Adversarial Hashing Transformer for Cross-modal Vessel Image Retrieval[J]. Journal of Electronics & Information Technology, 2023, 45(12): 4411-4420. doi: 10.11999/JEIT220980

用于跨模态舰船图像检索的判别性对抗哈希变换器

doi: 10.11999/JEIT220980
基金项目: 泰山学者工程专项经费(ts 201712072),国防科技卓越青年科学基金(2017-JCJQ-ZQ-003)
详细信息
    作者简介:

    关欣:女,博士,教授,研究方向为信息融合、电子对抗及智能计算

    国佳恩:男,硕士生,研究方向为多传感器信息融合

    卢雨:男,博士生,研究方向为无源协同定位、多源信息融合

    通讯作者:

    国佳恩 guojiaen@163.com

  • 中图分类号: TN913

Discriminant Adversarial Hashing Transformer for Cross-modal Vessel Image Retrieval

Funds: Taishan Scholar Engineering Special Fund (ts 201712072), The National Defense Science and Technology Excellence Youth Talent Fund (2017-JCJQ-ZQ-003)
  • 摘要: 针对当前主流的基于卷积神经网络(CNN)范式的跨模态图像检索算法无法有效提取舰船图像细节特征,以及跨模态“异构鸿沟”难以消除等问题,该文提出一种基于对抗机制的判别性哈希变换器(DAHT)用于舰船图像的跨模态快速检索。该网络采用双流视觉变换器(ViT)结构,依托ViT的自注意力机制进行舰船图像的判别性特征提取,并设计了Hash Token结构用于哈希生成;为了消除同类别图像的跨模态差异,整个检索框架以一种对抗的方式进行训练,通过对生成哈希码进行模态辨别实现模态混淆;同时设计了一种基于反馈机制的跨模加权5元组损失(NW-DCQL)以保持网络对不同类别图像的语义区分性。在两组数据集上开展的4类跨模态检索实验中,该文方法相比次优检索结果分别取得了9.8%, 5.2%, 19.7%, 21.6%的性能提升(32 bit),在单模态检索任务中亦具备一定的性能优势。
  • 图  1  DAHT基本框架

    图  2  IH-ViT结构

    图  3  加权5元组损失原理

    图  4  不同网络跨模态检索PR曲线对比(256 bit)

    图  5  不同网络单模态检索PR曲线对比(256 bit)

    图  6  不同参数设置下的检索mAP对比

    表  1  判别器结构组成

    层名称参数设置
    线性层1B×B/2
    激活层1ReLU
    线性层2B/2×B/4
    激活层2ReLU
    线性层3B/4×1
    激活层3Tanh
    下载: 导出CSV

    表  2  不同哈希码长度下跨模态检索mAP值对比

    数据集方法检索任务32 bit64 bit128 bit256 bit
    MPSCDAHTM2P0.6960.6910.6960.693
    P2M0.7150.7130.7290.714
    AGAHM2P0.4370.4440.4320.437
    P2M0.4460.4570.4430.446
    DADHM2P0.4550.4580.4460.432
    P2M0.4530.4610.4700.439
    DCMHM2P0.3780.4000.3320.440
    P2M0.3460.3700.2680.422
    DCMHNM2P0.5980.5890.6010.599
    P2M0.5630.5610.5930.568
    VAISDACHV2I0.5990.5820.6170.603
    I2V0.6030.6150.6110.635
    AGAHV2I0.3900.4010.3870.368
    I2V0.3690.3900.3830.361
    DADHV2I0.3890.3980.4010.413
    I2V0.3860.3920.3870.388
    DCMHV2I0.4010.4040.4030.396
    I2V0.3840.3680.3840.372
    DCMHNV2I0.4020.3990.4110.428
    I2V0.3870.3790.4020.404
    下载: 导出CSV

    表  3  不同哈希码长度下单模态检索mAP值对比

    数据集方法检索任务32 bit64 bit128 bit256 bit
    MPSCDAHTM2M0.6480.5480.6570.640
    P2P0.7770.7590.7810.780
    DHNM2M0.6780.6850.6680.651
    P2P0.5450.5510.5590.543
    DSHM2M0.5010.4710.4850.476
    P2P0.3660.4050.3880.360
    DCHM2M0.6950.6830.6920.669
    P2P0.5610.5570.5720.544
    DFHM2M0.6650.7000.6910.695
    P2P0.5690.5680.5700.572
    DPNM2M0.6460.6510.6590.654
    P2P0.5320.5510.5360.553
    VAISDAHTV2V0.6370.6250.6390.633
    I2I0.7190.7430.7520.736
    DHNV2V0.6130.6020.6200.641
    I2I0.5040.5290.5090.510
    DSHV2V0.5710.5540.4940.442
    I2I0.4680.4160.3970.356
    DCHV2V0.6310.6590.6670.656
    I2I0.5120.5290.5210.499
    DFHV2V0.6220.6480.6420.633
    I2I0.5100.5250.5140.509
    DPNV2V0.6200.6340.6630.645
    I2I0.4870.4910.4920.489
    下载: 导出CSV

    表  4  消融实验mAP值

    网络跨模态检索单模态检索
    M2PP2MV2II2VM2MP2PV2VI2I
    DAHT-10.6800.6910.6010.6150.6360.7640.5400.722
    DAHT-20.6000.6550.5950.5910.6000.7550.5190.725
    DAHT-30.6820.6890.5910.5840.6300.7610.5340.707
    DAHT-40.6680.6790.6000.5950.6300.7310.5090.726
    DAHT-50.6080.6310.5550.5790.5660.7250.5210.661
    DAHT-60.6680.6920.5880.5960.6380.7620.5290.715
    DAHT0.6930.7140.6030.6350.6400.7800.5530.736
    下载: 导出CSV

    表  5  不同方法训练时间及参数量对比

    DAHTDAHT-5DAHT-6AGAHDADHDCMHDCMHNDHNDSHDCHDFHDPN
    训练时间(s)49.6130.8753.1013.9117.0510.8415.9811.6311.8511.5712.9011.75
    参数量(M)85.825.685.857.550.847.153.823.623.623.623.623.6
    下载: 导出CSV
  • [1] MUKHERJEE S, COHEN S, and GERTNER I. Content-based vessel image retrieval[J]. SPIE Automatic Target Recognition XXVI, Baltimore, USA, 2016, 9844: 984412.
    [2] 何柏青, 王自敏. 反馈机制的大规模舰船图像检索[J]. 舰船科学技术, 2018, 40(4A): 157–159. doi: 10.3404/j.issn.1672-7649.2018.4A.053

    HE Baiqing and WANG Zimin. The feedback mechanism of large-scale ship image retrieval[J]. Ship Science and Technology, 2018, 40(4A): 157–159. doi: 10.3404/j.issn.1672-7649.2018.4A.053
    [3] LI Yansheng, ZHANG Yongjun, HUANG Xin, et al. Learning source-invariant deep hashing convolutional neural networks for cross-source remote sensing image retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(11): 6521–6536. doi: 10.1109/TGRS.2018.2839705
    [4] XIONG Wei, LV Yafei, ZHANG Xiaohan, et al. Learning to translate for cross-source remote sensing image retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(7): 4860–4874. doi: 10.1109/TGRS.2020.2968096
    [5] ZHU Junyan, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2242–2251.
    [6] XIONG Wei, XIONG Zhenyu, CUI Yaqi, et al. A discriminative distillation network for cross-source remote sensing image retrieval[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 13: 1234–1247. doi: 10.1109/JSTARS.2020.2980870
    [7] SUN Yuxi, FENG Shanshan, YE Yunming, et al. Multisensor fusion and explicit semantic preserving-based deep hashing for cross-modal remote sensing image retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5219614. doi: 10.1109/TGRS.2021.3136641
    [8] HU Peng, PENG Xi, ZHU Hongyuan, et al. Learning cross-modal retrieval with noisy labels[C]. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 5399–5409.
    [9] XU Xing, SONG Jingkuan, LU Huimin, et al. Modal-adversarial semantic learning network for extendable cross-modal retrieval[C]. 2018 ACM on International Conference on Multimedia Retrieval, Yokohama, Japan, 2018: 46–54.
    [10] WANG Bokun, YANG Yang, XU Xing, et al. Adversarial cross-modal retrieval[C]. The 25th ACM International Conference on Multimedia, Mountain View, USA, 2017: 154–162.
    [11] DONG Xinfeng, LIU Li, ZHU Lei, et al. Adversarial graph convolutional network for cross-modal retrieval[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(3): 1634–1645. doi: 10.1109/TCSVT.2021.3075242
    [12] GU Wen, GU Xiaoyan, GU Jingzi, et al. Adversary guided asymmetric hashing for cross-modal retrieval[C]. 2019 on International Conference on Multimedia Retrieval, Ottawa, Canada, 2019: 159–167.
    [13] HU Rong, YANG Jie, ZHU Bangpei, et al. Research on ship image retrieval based on BoVW model under hadoop platform[C]. The 1st International Conference on Information Science and Systems, Jeju, Korea, 2018: 156–160.
    [14] TIAN Chi, XIA Jinfeng, TANG Ji, et al. Deep image retrieval of large-scale vessels images based on BoW model[J]. Multimedia Tools and Applications, 2020, 79(13/14): 9387–9401. doi: 10.1007/s11042-019-7725-y
    [15] 邹利华. 基于PCA降维的舰船图像检索方法[J]. 舰船科学技术, 2020, 42(24): 97–99.

    ZOU Lihua. Research on ship image retrieval method based on PCA dimension reduction[J]. Ship Science and Technology, 2020, 42(24): 97–99.
    [16] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C/OL]. The 9th International Conference on Learning Representations, 2021.
    [17] HERMANS A, BEYER L, and LEIBE B. In defense of the triplet loss for person re-identification[J]. arXiv: 1703.07737, 2017.
    [18] LI Tao, ZHANG Zheng, PEI Lishen, et al. HashFormer: Vision transformer based deep hashing for image retrieval[J]. IEEE Signal Processing Letters, 2022, 29: 827–831. doi: 10.1109/LSP.2022.3157517
    [19] LI Mengyang, SUN Weiwei, DU Xuan, et al. Ship classification by the fusion of panchromatic image and multi-spectral image based on pseudo siamese LightweightNetwork[J]. Journal of Physics: Conference Series, 2021, 1757: 012022.
    [20] ZHANG M M, CHOI J, DANIILIDIS K, et al. VAIS: A dataset for recognizing maritime imagery in the visible and infrared spectrums[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops, Boston, USA, 2015: 10–16.
    [21] 关欣, 国佳恩, 衣晓. 基于低秩双线性池化注意力网络的舰船目标识别[J]. 系统工程与电子技术, 2023, 45(5): 1305–1314.

    GUAN Xin, GUO Jiaen, and YI Xiao. Low rank bilinear pooling attention network for ship target recognition[J]. Systems Engineering and Electronics, 2023, 45(5): 1305–1314.
    [22] BAI Cong, ZENG Chao, MA Qing, et al. Deep adversarial discrete hashing for cross-modal retrieval[C]. 2020 International Conference on Multimedia Retrieval, Dublin, Ireland, 2020: 525–531.
    [23] JIANG Qingyuan and LI Wujun. Deep cross-modal hashing[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 3270–3278.
    [24] XIONG Wei, XIONG Zhenyu, ZHANG Yang, et al. A deep cross-modality hashing network for SAR and optical remote sensing images retrieval[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 13: 5284–5296. doi: 10.1109/JSTARS.2020.3021390
    [25] ZHU Han, LONG Mingsheng, WANG Jianmin, et al. Deep hashing network for efficient similarity retrieval[C]. The Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, USA, 2016: 2415–2421.
    [26] LIU Haomiao, WANG Ruiping, SHAN Shiguang, et al. Deep supervised hashing for fast image retrieval[J]. International Journal of Computer Vision, 2019, 127(9): 1217–1234. doi: 10.1007/s11263-019-01174-4
    [27] CAO Yue, LONG Mingsheng, LIU Bin, et al. Deep Cauchy hashing for hamming space retrieval[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1229–1237.
    [28] LI Yunqiang, PEI Wenjie, ZHA Yufei, et al. Push for quantization: Deep fisher hashing[C]. The 30th British Machine Vision Conference 2019, Cardiff, UK, 2019.
    [29] FAN Lixin, NG K W, JU Ce, et al. Deep polarized network for supervised learning of accurate binary hashing codes[C]. The Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Japan, 2020: 825–831.
  • 加载中
图(6) / 表(5)
计量
  • 文章访问数:  402
  • HTML全文浏览量:  191
  • PDF下载量:  117
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-07-22
  • 修回日期:  2023-01-27
  • 网络出版日期:  2023-02-08
  • 刊出日期:  2023-12-26

目录

    /

    返回文章
    返回