高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于目标语义提示与双注意力感知的遥感图像文本检索方法

田澍 张秉熙 曹林 邢相薇 田菁 沈博 杜康宁 张晔

田澍, 张秉熙, 曹林, 邢相薇, 田菁, 沈博, 杜康宁, 张晔. 基于目标语义提示与双注意力感知的遥感图像文本检索方法[J]. 电子与信息学报. doi: 10.11999/JEIT240946
引用本文: 田澍, 张秉熙, 曹林, 邢相薇, 田菁, 沈博, 杜康宁, 张晔. 基于目标语义提示与双注意力感知的遥感图像文本检索方法[J]. 电子与信息学报. doi: 10.11999/JEIT240946
TIAN Shu, ZHANG Bingxi, CAO Lin, XING Xiangwei, TIAN Jing, SHEN Bo, DU Kangning, ZHANG Ye. Remote Sensing Image Text Retrieval Method Based on Object Semantic Prompt and Dual-Attention Perception[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240946
Citation: TIAN Shu, ZHANG Bingxi, CAO Lin, XING Xiangwei, TIAN Jing, SHEN Bo, DU Kangning, ZHANG Ye. Remote Sensing Image Text Retrieval Method Based on Object Semantic Prompt and Dual-Attention Perception[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240946

基于目标语义提示与双注意力感知的遥感图像文本检索方法

doi: 10.11999/JEIT240946
基金项目: 国家自然科学基金(62001032, 62201066, U20A20163),北京市教委科研计划(KZ202111232049, KM202111232014)
详细信息
    作者简介:

    田澍:男,副教授,研究方向为遥感图像处理

    张秉熙:男,硕士生,研究方向为遥感图像处理

    曹林:男,教授,研究方向为智能信号处理

    邢相薇:男,副研究员,研究方向为遥感信息智能解译

    田菁:女,研究员,研究方向为遥感信息智能解译

    沈博:女,高级工程师,研究方向为模式识别与智能信息处理

    杜康宁:男,副教授,研究方向为智能信号处理

    张晔:男,教授,研究方向为遥感图像处理

    通讯作者:

    曹林 charlin26@163.com

  • 中图分类号: TN911; TP751

Remote Sensing Image Text Retrieval Method Based on Object Semantic Prompt and Dual-Attention Perception

Funds: The National Natural Science Foundation of China (62001032, 62201066, U20A20163), Beijing Municipal Education Commission’s Scientific Research Plan (KZ202111232049, KM202111232014)
  • 摘要: 高分辨率遥感图像场景复杂、语义信息丰富多样且目标尺度多变,容易引起特征空间中不同类别目标的图像特征分布混淆,导致模型难以高效捕获遥感目标文本语义与图像特征的潜在关联,进而影响遥感图像文本检索的精度。针对这一问题,该文提出基于目标语义提示与双注意力感知的遥感图像文本检索方法。该方法首先引入空间-通道协同注意力,利用空间-通道维度注意权重交互捕捉图像全局上下文特征。同时,为了实现遥感图像显著目标信息的多粒度精准表征,模型通过所构建的基于自适应显著性区域目标感知注意力机制,通过动态多尺度目标特征加权聚合,提升对目标局部区域显著性特征聚焦响应。此外,该文设计了目标类别概率先验引导策略,对文本描述进行目标类别语义词频统计,以获取高概率先验目标语义信息,进而指导在跨模态共性嵌入空间中的图像特征聚类,最终实现高效准确的图像-文本特征对齐。该方法在RSICD与RSITMD两组遥感图像文本检索基准数据集上开展实验评估。结果表明,所设计的方法在检索精度指标上展现出了卓越的性能优势。
  • 图  1  OSDPM网络模型结构图

    图  2  DAPN网络模型结构图

    图  3  文本描述中进行目标类别语义名词词频统计方法示例

    图  4  OSFCM模块与模型训练策略

    图  5  DAPN模块注意力热图可视化实验示例

    图  6  OSDPM模型与HVSA模型文本对图像检索结果对比图

    图  7  OSDPM模型与HVSA模型图像对文本检索结果对比图

    图  8  共性特征空间中t-SNE特征分布可视化实验

    表  1  与其他方法在RSICD测试集上的检索性能比较(%)

    图像→文本 文本→图像 Rsum
    方法名称 R@1 R@5 R@10 R@1 R@5 R@10
    VSE++ 3.38 9.51 17.46 2.82 11.32 18.10 62.59
    SCAN t2i 4.39 10.90 17.64 3.91 16.20 26.49 79.53
    SCAN i2t 5.85 12.89 19.84 3.71 16.40 26.73 85.42
    CAMP-triplet 5.12 12.89 21.12 4.15 15.23 27.81 86.32
    CAMP-bee 4.20 10.24 15.45 2.72 12.76 22.89 68.26
    MTFN 5.02 12.52 19.74 4.90 17.17 29.49 88.84
    LW-MCR(b) 4.57 13.71 20.11 4.02 16.47 28.23 87.11
    LW-MCR(d) 3.29 12.52 19.93 4.66 17.51 30.02 87.93
    AMFMN-soft 5.05 14.53 21.57 5.05 19.74 31.04 96.98
    AMFMN-fusion 5.39 15.08 23.40 4.90 18.28 31.44 98.49
    AMFMN-sim 5.21 14.72 21.57 4.08 17.00 30.60 93.18
    GaLR 6.50 18.91 29.70 5.11 19.57 31.92 111.71
    HVSA 7.47 20.62 32.11 5.51 21.13 34.13 120.97
    OSDPM 8.39 22.20 33.21 6.59 22.92 36.67 129.98
    下载: 导出CSV

    表  2  与其他方法在RSITMD测试集上的检索性能比较(%)

    图像→文本 文本→图像 Rsum
    方法名称 R@1 R@5 R@10 R@1 R@5 R@10
    VSE++ 10.38 27.65 39.60 7.79 24.87 38.67 148.96
    SCAN t2i 10.18 28.53 38.49 10.10 28.98 43.53 159.81
    SCAN i2t 11.06 25.88 39.38 9.82 29.38 42.12 157.64
    CAMP-triplet 11.73 26.99 38.05 8.27 27.79 44.34 157.17
    CAMP-bee 9.07 23.01 33.19 5.22 23.32 38.36 132.17
    MTFN 10.40 27.65 36.28 9.96 31.37 45.84 161.50
    LW-MCR(b) 9.07 22.79 38.05 6.11 27.74 49.56 153.32
    LW-MCR(d) 10.18 28.98 39.82 7.79 30.18 49.78 166.73
    AMFMN-soft 11.06 25.88 39.82 9.82 33.94 51.90 172.42
    AMFMN-fusion 11.06 29.20 38.72 9.96 34.03 52.96 175.93
    AMFMN-sim 10.63 24.78 41.81 11.51 34.69 54.87 178.29
    GaLR 13.05 30.09 42.70 10.47 36.34 53.35 186.00
    HVSA 13.20 32.08 45.58 11.43 39.20 57.45 198.94
    OSDPM 14.68 35.10 49.48 11.68 39.38 57.45 207.77
    下载: 导出CSV

    表  3  所提出的OSDPM遥感图像文本检索模型在RSICD与RSITMD两组测试集上的消融实验(%)

    数据集 CLIP DAPN OSFCM 图像→文本 文本→图像 Rsum
    R@1 R@5 R@10 R@1 R@5 R@10
    RSICD 7.47 21.23 32.63 6.23 22.01 35.74 125.31
    8.23 21.32 32.72 6.29 22.18 36.26 127.01
    8.39 22.20 33.21 6.58 22.92 36.67 129.98
    RSITMD 13.94 33.41 46.76 12.33 38.86 57.36 202.65
    15.49 34.14 47.05 12.53 38.68 56.40 204.28
    14.67 35.10 49.48 11.68 39.38 57.45 207.77
    下载: 导出CSV
  • [1] SUDMANNS M, TIEDE D, LANG S, et al. Big earth data: Disruptive changes in Earth observation data management and analysis?[J]. International Journal of Digital Earth, 2020, 13(7): 832–850. doi: 10.1080/17538947.2019.1585976.
    [2] ZHOU Weixun, NEWSAM S, LI Congmin, et al. PatternNet: A benchmark dataset for performance evaluation of remote sensing image retrieval[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2018, 145: 197–209. doi: 10.1016/j.isprsjprs.2018.01.004.
    [3] 翁星星, 庞超, 许博文, 等. 面向遥感图像解译的增量深度学习[J]. 电子与信息学报, 2024, 46(10): 3979–4001. doi: 10.11999/JEIT240172.

    WENG Xingxing, PANG Chao, XU Bowen, et al. Incremental deep learning for remote sensing image interpretation[J]. Journal of Electronics & Information Technology, 2024, 46(10): 3979–4001. doi: 10.11999/JEIT240172.
    [4] HUANG Jiaxiang, FENG Yong, ZHOU Mingliang, et al. Deep multiscale fine-grained hashing for remote sensing cross-modal retrieval[J]. IEEE Geoscience and Remote Sensing Letters, 2024, 21: 6002205. doi: 10.1109/LGRS.2024.3351368.
    [5] 金澄, 弋步荣, 曾志昊, 等. 一种顾及空间语义的跨模态遥感影像检索技术[J]. 中国电子科学研究院学报, 2023, 18(4): 328–335,385. doi: 10.3969/j.issn.1673-5692.2023.04.005.

    JIN Cheng, YI Burong, ZENG Zhihao, et al. A cross-modal remote sensing image retrieval technique considering spatial semantics[J]. Journal of China Academy of Electronics and Information Technology, 2023, 18(4): 328–335,385. doi: 10.3969/j.issn.1673-5692.2023.04.005.
    [6] 冯孝鑫, 王子健, 吴奇. 基于三元采样图卷积网络的半监督遥感图像检索[J]. 电子与信息学报, 2023, 45(2): 644–653. doi: 10.11999/JEIT211478.

    FENG Xiaoxin, WANG Zijian, and WU Qi. Semi-supervised learning remote sensing image retrieval method based on triplet sampling graph convolutional network[J]. Journal of Electronics & Information Technology, 2023, 45(2): 644–653. doi: 10.11999/JEIT211478.
    [7] ZHAO Zuopeng, MIAO Xiaoran, HE Chen, et al. Masking-based cross-modal remote sensing image–text retrieval via dynamic contrastive learning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5626215. doi: 10.1109/TGRS.2024.3406897.
    [8] WANG Fei, ZHU Xianzhang, LIU Xiaojian, et al. Scene graph-aware hierarchical fusion network for remote sensing image retrieval with text feedback[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4705116. doi: 10.1109/TGRS.2024.3404605.
    [9] 张若愚, 聂婕, 宋宁, 等. 基于布局化-语义联合表征遥感图文检索方法[J]. 北京航空航天大学学报, 2024, 50(2): 671–683. doi: 10.13700/j.bh.1001-5965.2022.0527.

    ZHANG Ruoyu, NIE Jie, SONG Ning, et al. Remote sensing image-text retrieval based on layout semantic joint representation[J]. Journal of Beijing University of Aeronautics and Astronautics, 2024, 50(2): 671–683. doi: 10.13700/j.bh.1001-5965.2022.0527.
    [10] CHEN Yaxiong, HUANG Jinghao, LI Xiaoyu, et al. Multiscale salient alignment learning for remote-sensing image-text retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4700413. doi: 10.1109/TGRS.2023.3340870.
    [11] YANG Rui, WANG Shuang, HAN Yingping, et al. Transcending fusion: A multiscale alignment method for remote sensing image-text retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4709217 . doi: 10.1109/TGRS.2024.3496898.
    [12] XU Yahui, BIN Yi, WEI Jiwei, et al. Align and retrieve: Composition and decomposition learning in image retrieval with text feedback[J]. IEEE Transactions on Multimedia, 2024, 26: 9936–9948. doi: 10.1109/TMM.2024.3417694.
    [13] MA Qing, PAN Jiancheng, and BAI Cong. Direction-oriented visual-semantic embedding model for remote sensing image-text retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 4704014. doi: 10.1109/TGRS.2024.3392779.
    [14] 钟金彦, 陈俊, 李宇, 等. 基于MFF-SFE的遥感图文跨模态检索方法[J]. 中国科学院大学学报(中英文), 2025, 42(2): 236–247. doi: 10.7523/j.ucas.2024.025.

    ZHONG Jinyan, CHEN Jun, LI Yu, et al. Cross-modal retrieval method based on MFF-SFE for remote sensing image-text[J]. Journal of University of Chinese Academy of Sciences, 2025, 42(2): 236–247. doi: 10.7523/j.ucas.2024.025.
    [15] SUN Yuli, LEI Lin, LI Xiao, et al. Nonlocal patch similarity based heterogeneous remote sensing change detection[J]. Pattern Recognition, 2021, 109: 107598. doi: 10.1016/j.patcog.2020.107598.
    [16] ZHANG Shun, LI Yupeng, and MEI Shaohui. Exploring uni-modal feature learning on entities and relations for remote sensing cross-modal text-image retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5626317. doi: 10.1109/TGRS.2023.3333375.
    [17] ZHU Jingru, GUO Ya, SUN Geng, et al. Unsupervised domain adaptation semantic segmentation of high-resolution remote sensing imagery with invariant domain-level prototype memory[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5603518. doi: 10.1109/TGRS.2023.3243042.
    [18] LIU Zejun, CHEN Fanglin, XU Jun, et al. Image-text retrieval with cross-modal semantic importance consistency[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(5): 2465–2476. doi: 10.1109/TCSVT.2022.3220297.
    [19] ZHUANG Jiamin, YU Jing, DING Yang, et al. Towards fast and accurate image-text retrieval with self-supervised fine-grained alignment[J]. IEEE Transactions on Multimedia, 2024, 26: 1361–1372. doi: 10.1109/TMM.2023.3280734.
    [20] YU Hongfeng, YAO Fanglong, LU Wanxuan, et al. Text-image matching for cross-modal remote sensing image retrieval via graph neural network[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023, 16: 812–824. doi: 10.1109/JSTARS.2022.3231851.
    [21] ABDULLAH T, BAZI Y, AL RAHHAL M M, et al. TextRS: Deep bidirectional triplet network for matching text to remote sensing images[J]. Remote Sensing, 2020, 12(3): 405. doi: 10.3390/rs12030405.
    [22] TANG Xu, HUANG Dabiao, MA Jingjing, et al. Prior-experience-based vision-language model for remote sensing image-text retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5641913. doi: 10.1109/TGRS.2024.3464468.
    [23] YUAN Zhiqiang, ZHANG Wenkai, TIAN Changyuan, et al. Remote sensing cross-modal text-image retrieval based on global and local information[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5620616. doi: 10.1109/TGRS.2022.3163706.
    [24] YUAN Zhiqiang, ZHANG Wenkai, FU Kun, et al. Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 4404119. doi: 10.1109/TGRS.2021.3078451.
    [25] MI Li, DAI Xianjie, CASTILLO-NAVARRO J, et al. Knowledge-aware text-image retrieval for remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5646813. doi: 10.1109/TGRS.2024.3486977.
    [26] LU Xiaoqiang, WANG Binqiang, ZHENG Xiangtao, et al. Exploring models and data for remote sensing image caption generation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(4): 2183–2195. doi: 10.1109/TGRS.2017.2776321.
    [27] ZHANG Weihang, LI Jihao, LI Shuoke, et al. Hypersphere-based remote sensing cross-modal text–image retrieval via curriculum learning[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5621815. doi: 10.1109/TGRS.2023.3318227.
    [28] XU Xing, WANG Tan, YANG Yang, et al. Cross-modal attention with semantic consistence for image–text matching[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(12): 5412–5425. doi: 10.1109/TNNLS.2020.2967597.
  • 加载中
图(8) / 表(3)
计量
  • 文章访问数:  50
  • HTML全文浏览量:  19
  • PDF下载量:  16
  • 被引次数: 0
出版历程
  • 收稿日期:  2024-10-25
  • 修回日期:  2025-05-19
  • 网络出版日期:  2025-06-04

目录

    /

    返回文章
    返回