Semi-supervised Learning Remote Sensing Image Retrieval Method Based on Triplet Sampling Graph Convolutional Network
-
摘要: 该文提出了一种基于三元采样图卷积网络的度量学习方法,以实现遥感图像的半监督检索。所提方法由三元图卷积网络(TGCN)和基于图的三元组采样(GTS)两部分组成。TGCN由3个具有共享权重的并行卷积神经网络和图卷积网络组成,用以提取图像的初始特征以及学习图像的图嵌入。通过同时学习图像特征以及图嵌入,TGCN能够得到用于半监督图像检索的有效图结构。接着,通过提出的GTS算法对图结构内隐含的图像相似性信息进行评价,以选择合适的困难三元组(Hard Triplet),并利用困难三元组组成的样本集合对模型进行有效快速的模型训练。通过TGCN和GTS的组合,提出的度量学习方法在两个遥感数据集上进行了测试。实验结果表明,TGCN-GTS具有以下两方面的优越性:TGCN能够根据图像及图结构学习到有效的图嵌入特征及度量空间;GTS有效评估图结构内隐含的图像相似性信息选择合适的困难三元组,显著提升了半监督遥感图像检索效果。Abstract: In this paper, a novel metric learning method based on the triplet sampling graph convolutional network is proposed to realize semi-supervised Content-Based Image Retrieval (CBIR) for remote sensing images. The proposed method consists of two parts: Triplet Graph Convolutional Network (TGCN) and Graph-based Triplet Sampling (GTS). TGCN is composed of three parallel convolutional neural networks and graph convolutional networks with shared weights to extract the initial features of the image and learn the graph embedding of the image. By learning simultaneously image features and graph embedding, TGCN can obtain an effective graph structure for semi-supervised image retrieval.Besides, the image similarity information implicit in the graph structure is evaluated by the proposed GTS algorithm to select the appropriate Hard triplet, and the sample set composed of the Hard triplet then can be used to train effectively and quickly the model. Through the combination of TGCN and GTS, the proposed metric learning method is tested on two remote sensing data sets. Experimental results show that TGCN-GTS has the following two advantages: TGCN can learn effective graph embedding features and metric space according to the image and graph structure; GTS evaluates effectively the image similarity information implicit in the image structure and then selects the appropriate Hard triplet, which improves significantly the retrieval performance of semi-supervised remote sensing images.
-
表 1 各算法在AID和NWPU-RESISC45数据集上的图像检索mAP@40
Method AID NWPU-RESISC45 LTDR 5% LTDR 10% LTDR 20% LTDR 5% LTDR 10% LTDR 20% DML 0.7209 0.7913 0.8779 0.7041 0.7594 0.8111 D-CNN 0.7521 0.8237 0.9149 0.7502 0.8229 0.8843 SNCA 0.7832 0.8265 0.9275 0.8143 0.8432 0.9091 HRS2DML 0.7876 0.8490 0.9060 0.8044 0.8505 0.9010 TGCN-RTS 0.7913 0.8576 0.8928 0.8369 0.8872 0.9044 TGCN-GTS 0.9021 0.9437 0.9712 0.9241 0.9580 0.9648 表 2 算法模型复杂度比较
方法 ${\rm{NP}}\left(\times {10}^{6}\right)$ ${\rm{FLOPS}}(\times {10}^{9})$ DML 11.2093 7.1258 TGCN-RTS 11.7279 7.2989 TGCN-GTS 11.7279 7.7584 -
[1] SUMBUL G, KANG Jian, and DEMIR B. Deep learning for image search and retrieval in large remote sensing archives[M]. CAMPS-VALLS G, TUIA D, ZHU Xiaoxiang, et al. Deep Learning for the Earth Sciences: A Comprehensive Approach to Remote Sensing, Climate Science, and Geosciences. Hoboken: John Wiley & Sons Ltd, 2021: 150–160. [2] YE Famao, XIAO Hui, ZHAO Xuqing, et al. Remote sensing image retrieval using convolutional neural network features and weighted distance[J]. IEEE Geoscience and Remote Sensing Letters, 2018, 15(10): 1535–1539. doi: 10.1109/LGRS.2018.2847303 [3] ROY S, SANGINETO E, DEMIR B, et al. Metric-learning-based deep hashing network for content-based retrieval of remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2021, 18(2): 226–230. doi: 10.1109/LGRS.2020.2974629 [4] SCHROFF F, KALENICHENKO D, and PHILBIN J. FaceNet: A unified embedding for face recognition and clustering[C]. Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 815–823. [5] ZHU Panpan, TAN Yumin, ZHANG Liqiang, et al. Deep learning for multilabel remote sensing image annotation with dual-level semantic concepts[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(6): 4047–4060. doi: 10.1109/TGRS.2019.2960466 [6] ZHANG Shizhou, ZHANG Qi, WEI Xing, et al. Person re-identification with triplet focal loss[J]. IEEE Access, 2018, 6: 78092–78099. doi: 10.1109/ACCESS.2018.2884743 [7] KIM S, SEO M, LAPTEV I, et al. Deep metric learning beyond binary supervision[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 2283–2292. [8] WANG Xun, HAN Xintong, HUANG Weilin, et al. Multi-similarity loss with general pair weighting for deep metric learning[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 5017–5025. [9] WANG Xinshao, HUA Yang, KODIROV E, et al. Ranked list loss for deep metric learning[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 5202–5211. [10] ZHANG Maoding, CHENG Qimin, LUO Fang, et al. A triplet nonlocal neural network with dual-anchor triplet loss for high-resolution remote sensing image retrieval[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2021, 14: 2711–2723. doi: 10.1109/JSTARS.2021.3058691 [11] WU Chaoyuan, MANMATHA R, SMOLA A J, et al. Sampling matters in deep embedding learning[C]. Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2859–2867. [12] ZHANG Dingyi, LI Yingming, and ZHANG Zhongfei. Deep metric learning with spherical embedding[C]. Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, Canada, 2020: 1576. [13] XUAN Hong, STYLIANOU A, and PLESS R. Improved embeddings with easy positive triplet mining[C]. Proceedings of 2020 IEEE Winter Conference on Applications of Computer Vision, Snowmass, USA, 2020: 2463–2471. [14] YUAN Yuhui, YANG Kuiyuan, and ZHANG Chao. Hard-aware deeply cascaded embedding[C]. Proceedings of 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 814–823. [15] YANG Xun, ZHOU Peicheng, and WANG Meng. Person reidentification via structural deep metric learning[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(10): 2987–2998. doi: 10.1109/TNNLS.2018.2861991 [16] GE Weifeng, HUANG Weilin, DONG Dengke, et al. Deep metric learning with hierarchical triplet loss[C]. Proceedings of European Conference on Computer Vision, Munich, Germany, 2018: 272–288. [17] CAO Rui, ZHANG Qian, ZHU Jiasong, et al. Enhancing remote sensing image retrieval using a triplet deep metric learning network[J]. International Journal of Remote Sensing, 2020, 41(2): 740–751. doi: 10.1080/2150704X.2019.1647368 [18] JIANG Bo, ZHANG Ziyan, LIN Doudou, et al. Semi-supervised learning with graph learning-convolutional networks[C]. Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 11305–11312. [19] DUVENAUD D, MACLAURIN D, AGUILERA-IPARRAGUIRRE J, et al. Convolutional networks on graphs for learning molecular fingerprints[C]. Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 2015: 2224–2232. [20] BRUNA J, ZAREMBA W, SZLAM A, et al. Spectral networks and locally connected networks on graphs[C]. Proceedings of the 2nd International Conference on Learning Representations, Banff, Canada, 2013. [21] KIPF T N and WELLING M. Semi-supervised classification with graph convolutional networks[C]. Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 2017. [22] KANG Jian, FERNANDEZ-BELTRAN R, HONG Danfeng, et al. Graph relation network: Modeling relations between scenes for multilabel remote-sensing image classification and retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(5): 4355–4369. doi: 10.1109/TGRS.2020.3016020 [23] CHAUDHURI U, BANERJEE B, and BHATTACHARYA A. Siamese graph convolutional network for content based remote sensing image retrieval[J]. Computer Vision and Image Understanding, 2019, 184: 22–30. doi: 10.1016/j.cviu.2019.04.004 [24] RIBA P, FISCHER A, LLADÓS J, et al. Learning graph edit distance by graph neural networks[J]. Pattern Recognition, 2021, 120: 108132. doi: 10.1016/j.patcog.2021.108132 [25] XIA Guisong, HU Jingwen, HU Fan, et al. AID: A benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7): 3965–3981. doi: 10.1109/TGRS.2017.2685945 [26] CHENG Gong, HAN Junwei, and LU Xiaoqiang. Remote sensing image scene classification: Benchmark and state of the art[J]. Proceedings of the IEEE, 2017, 105(10): 1865–1883. doi: 10.1109/JPROC.2017.2675998 [27] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. [28] CHENG Gong, YANG Ceyuan, YAO Xiwen, et al. When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative CNNs[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(5): 2811–2821. doi: 10.1109/TGRS.2017.2783902 [29] KANG Jian, FERNANDEZ-BELTRAN R, YE Zhen, et al. Deep metric learning based on scalable neighborhood components for remote sensing scene characterization[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(12): 8905–8918. doi: 10.1109/TGRS.2020.2991657 [30] KANG Jian, FERNÁNDEZ-BELTRÁN R, YE Zhen, et al. High-rankness regularized semi-supervised deep metric learning for remote sensing imagery[J]. Remote Sensing, 2020, 12(16): 2603. doi: 10.3390/rs12162603