基于多尺度池化和范数注意力机制的遥感图像检索

葛芸; 马琳; 叶发茂; 储珺

doi:10.11999/JEIT210052

基于多尺度池化和范数注意力机制的遥感图像检索

doi: 10.11999/JEIT210052 cstr: 32379.14.JEIT210052

葛芸¹,
马琳¹,
叶发茂²,
储珺^1, ,

1.
南昌航空大学软件学院南昌 330063
2.
东华理工大学测绘工程学院南昌 330013

基金项目: 国家自然科学基金(41801288, 41261091)，江西省自然科学基金(20202BAB212011, 20202BABL202030)，江西省重点研发计划项目(20192BBE50073, 20203BBGL73222)

详细信息

作者简介:
葛芸：女，1983年生，博士，副教授，研究方向为遥感图像处理与机器学习

马琳：女，1996年生，硕士，研究方向为遥感图像处理与机器学习

叶发茂：男，1978年生，博士，副教授，研究方向为遥感图像处理、计算机图形学、机器学习

储珺：女，1967年生，博士，教授，研究方向为图像处理、计算机视觉、计算机图形学和数据融合

通讯作者:
储珺　chujun99602@163.com

中图分类号: TN911.73; TP751.1
计量
- 文章访问数: 1610
- HTML全文浏览量: 726
- PDF下载量: 135
- 被引次数: 0
出版历程
- 收稿日期: 2021-01-18
- 修回日期: 2021-07-20
- 网络出版日期: 2021-07-29
- 刊出日期: 2022-02-25

Remote Sensing Image Retrieval Based on Multi-scale Pooling and Norm Attention Mechanism

GE Yun¹,
MA Lin¹,
YE Famao²,
CHU Jun^{1
, ,}

1.
School of Software, Nanchang Hangkong University, Nanchang 330063, China
2.
School of Surveying and Mapping Engineering, East China University of Technology, Nanchang 330013, China

Funds: The National Natural Science Foundation of China (41801288, 41261091), The Natural Science Foundation of Jiangxi Province (20202BAB212011, 20202BABL202030), The Key Research and Development Project of Jiangxi Province (20192BBE50073, 20203BBGL73222)

摘要

摘要: 遥感图像内容丰富，一般的深度模型提取遥感图像特征时容易受复杂背景干扰，对关键特征的提取效果不佳，并且难以表达图像的空间信息，该文提出一种基于多尺度池化和范数注意力机制的深度卷积神经网络，在通道层面与空间层面自适应地给显著特征加权。首先，在多尺度池化通道注意力模块中，结合空间金字塔池化的思想，对每个通道上的特征图进行不同尺度的最大池化。接着，采用自适应均值池化将尺寸不同的特征图转换为统一尺寸，以便通过逐像素相加的方式来关注不同尺度的显著特征。然后，在范数空间注意力模块中，将各通道对应同一空间位置的像素构成向量，通过计算向量组的L1范数和L2范数，获得具有空间信息的特征图。最后，采用级联池化的方法优化高层特征，并将该高层特征用于遥感图像检索。在UC Merced, AID与NWPU-RESISC45 3个数据集上进行实验，结果表明该文所提注意力模型，关注了不同尺度的显著特征，结合了空间信息，提高了检索性能。
- 遥感图像检索 /
- 空间金字塔 /
- 范数 /
- 注意力机制 /
- 级联池化
Abstract: Remote sensing images have rich content, and then the features extracted by the general depth model are easily interfered by the complex background. The key features can not be extracted well, and it is difficult to express the spatial information of the image. A deep convolutional neural network based on multi-scale pooling and norm attention mechanism is proposed, which weights adaptively salient features at the channel level and the spatial level. First, in the multi-scale pooling channel attention module, the max pooling of different scales is performed on the feature map of each channel based on spatial pyramid pooling. Next, the feature maps of different sizes are transformed to a uniform size by adaptive average pooling. Thus the salient features of different scales can be paid attention by element-wise addition. Then, in the norm spatial attention module, the pixels corresponding to the same spatial position of each channel are formed into vectors, and the feature map with spatial information is obtained by calculating the L1 norm and L2 norm of the vector group. Finally, the cascaded pooling method is adopted to optimize the high-level features, and the high-level features are used for remote sensing image retrieval. Experiment are conducted on UC Merced data set, AID data set and NWPU-RESISC45 data set. The results show that the proposed attention model improves the retrieval performance by concerning the salient features of different scales and combining the spatial information.
- Remote sensing image retrieval /
- Spatial pyramid /
- Norm /
- Attention mechanism /
- Cascading pooling

HTML全文

图 1 类间相似性大的遥感图像示例

下载: 全尺寸图片幻灯片

图 2 多尺度池化和范数注意力机制模型结构

下载: 全尺寸图片幻灯片

图 3 多尺度池化通道注意力模块

下载: 全尺寸图片幻灯片

图 4 空间注意力模块

下载: 全尺寸图片幻灯片

图 5 迁移学习过程

下载: 全尺寸图片幻灯片

图 6 示例图像

下载: 全尺寸图片幻灯片

图 7 不同方法特征图差异

下载: 全尺寸图片幻灯片

图 8 P-R曲线

下载: 全尺寸图片幻灯片

表 1 UC Merced数据集和AID数据集不同方法检索结果

方法	UC Merced数据集		AID数据集
方法	mAP	ANMRR	mAP	ANMRR
Resnet50-cp	0.812	0.163	0.850	0.142
Resnet50_CBAM-cp	0.870	0.110	0.920	0.083
Resnet50_C-cp	0.898	0.084	0.935	0.073
Resnet50_S-cp	0.892	0.073	0.936	0.074
Resnet50_SC-cp	0.924	0.059	0.940	0.068
注：加粗字体为每列最优结果。

下载: 导出CSV

表 2 不同方法的平均检索时间比较(ms)

方法	平均检索时间
Resnet50	2.17
Resnet50_CBAM	2.18
Resnet50_SC	2.18
注：加粗字体为每列最优结果。

下载: 导出CSV

表 3 迁移特征的检索结果

方法	全局池化		级联池化
方法	mAP	ANMRR	mAP	ANMRR
Resnet50_CBAM	0.763	0.190	0.790	0.168
Resnet50_C	0.800	0.161	0.809	0.154
Resnet50_S	0.789	0.169	0.812	0.149
Resnet50_SC	0.818	0.146	0.827	0.138
注：加粗字体为每列最优结果。

下载: 导出CSV

表 4 与其他方法mAP的比较

方法	UC Merced	AID
ResNet_CBAM	0.869	0.920
文献[21]	0.840	–
文献[20]	–	0.918
文献[6]	0.916	0.926
本文Resnet50_SC	0.924	0.940
注：加粗字体为每列最优结果。

下载: 导出CSV

参考文献(26)

[1]	SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J]. Computer Science, arXiv: 1409.1556, 2014.
[2]	HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 770-778.
[3]	HU Jie, SHEN Li, ALBANIE S, et al. Squeeze-and-excitation networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(8): 2011–2023. doi: 10.1109/TPAMI.2019.2913372
[4]	叶发茂, 董萌, 罗威, 等. 基于卷积神经网络和重排序的农业遥感图像检索[J]. 农业工程学报, 2019, 35(15): 138–145. doi: 10.11975/j.issn.1002-6819.2019.15.018 YE Famao, DONG Meng, LUO Wei, et al. Agricultural remote sensing image retrieval based on convolutional neural network and reranking[J]. Transactions of the Chinese Society of Agricultural Engineering, 2019, 35(15): 138–145. doi: 10.11975/j.issn.1002-6819.2019.15.018
[5]	LI Yansheng, ZHANG Yongjun, HUANG Xin, et al. Large-scale remote sensing image retrieval by deep hashing neural networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(2): 950–965. doi: 10.1109/TGRS.2017.2756911
[6]	ROY S, SANGINETO E, DEMIR B, et al. Metric-learning-based deep hashing network for content-based retrieval of remote sensing images[J]. IEEE Geoscience and Remote Sensing Letters, 2021, 18(2): 226–230. doi: 10.1109/LGRS.2020.2974629
[7]	彭晏飞, 宋晓男, 武宏, 等. 结合深度学习与相关反馈的遥感图像检索[J]. 中国图象图形学报, 2019, 24(3): 420–434. doi: 10.11834/jig.180384 PENG Yanfei, SONG Xiaonan, WU Hong, et al. Remote sensing image retrieval combined with deep learning and relevance feedback[J]. Journal of Image and Graphics, 2019, 24(3): 420–434. doi: 10.11834/jig.180384
[8]	YE Famao, ZHAO Xuqing, LUO Wei, et al. Query-adaptive remote sensing image retrieval based on image rank similarity and image-to-query class similarity[J]. IEEE Access, 2020, 8: 116824–116839. doi: 10.1109/ACCESS.2020.3004360
[9]	MA Chenhui, MU Xiaodong, and SHA Dexuan. Multi-layers feature fusion of convolutional neural network for scene classification of remote sensing[J]. IEEE Access, 2019, 7: 121685–121694. doi: 10.1109/ACCESS.2019.2936215
[10]	LIU Yishu, CHEN Conghui, HAN Zhengzhuo, et al. High-resolution remote sensing image retrieval based on classification-similarity networks and double fusion[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2020, 13: 1119–1133. doi: 10.1109/JSTARS.2020.2981372
[11]	储珺, 朱晓阳, 冷璐, 等. 引入通道注意力和残差学习的目标检测器[J]. 模式识别与人工智能, 2020, 33(10): 889–897. doi: 10.16451/j.cnki.issn1003-6059.202010003 CHU Jun, ZHU Xiaoyang, LENG Lu, et al. Target detector with channel attention and residual learning[J]. Pattern Recognition and Artificial Intelligence, 2020, 33(10): 889–897. doi: 10.16451/j.cnki.issn1003-6059.202010003
[12]	WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]. 15th European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 3–19.
[13]	翟鹏博, 杨浩, 宋婷婷, 等. 结合注意力机制的双路径语义分割[J]. 中国图象图形学报, 2020, 25(8): 1627–1636. doi: 10.11834/jig.190533 ZHAI Pengbo, YANG Hao, SONG Tingting, et al. Two-path semantic segmentation algorithm combining attention mechanism[J]. Journal of Image and Graphics, 2020, 25(8): 1627–1636. doi: 10.11834/jig.190533
[14]	WANG Qi, LIU Shaoteng, CHANUSSOT J, et al. Scene classification with recurrent attention of VHR remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(2): 1155–1167. doi: 10.1109/TGRS.2018.2864987
[15]	GUO Yiyou, JI Jingsheng, LU Xiankai, et al. Global-local attention network for aerial scene classification[J]. IEEE Access, 2019, 7: 67200–67212. doi: 10.1109/ACCESS.2019.2918732
[16]	ZHANG Shu, YUAN Qiangqiang, LI Jie, et al. Scene-adaptive remote sensing image super-resolution using a multiscale attention network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(7): 4764–4779. doi: 10.1109/TGRS.2020.2966805
[17]	徐从安, 吕亚飞, 张筱晗, 等. 基于双重注意力机制的遥感图像场景分类特征表示方法[J]. 电子与信息学报, 2021, 43(3): 683–691. doi: 10.11999/JEIT200568 XU Congan, LÜ Yafei, ZHANG Xiaohan, et al. A Discriminative feature representation method based on dual attention mechanism for remote sensing image scene classification[J]. Journal of Electronics &Information Technology, 2021, 43(3): 683–691. doi: 10.11999/JEIT200568
[18]	ZHANG Yongmei, XU Min, and LI Xiaodong. Remote sensing image retrieval based on DenseNet model and CBAM[C]. 2020 IEEE 3rd International Conference on Computer and Communication Engineering Technology (CCET), Beijing, China, 2020: 86–90. doi: 10.1109/CCET50901.2020.9213121.
[19]	WANG Yameng, JI Shunping, LU Meng, et al. Attention boosted bilinear pooling for remote sensing image retrieval[J]. International Journal of Remote Sensing, 2020, 41(7): 2704–2724. doi: 10.1080/01431161.2019.1697010
[20]	LIU Chao, MA Jingjing, TANG Xu, et al. Deep hash learning for remote sensing image retrieval[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 59(4): 3420–3443. doi: 10.1109/TGRS.2020.3007533
[21]	XIONG Wei, LÜ Yafei, CUI Yaqi, et al. A discriminative feature learning approach for remote sensing image retrieval[J]. Remote Sensing, 2019, 11(3): 281. doi: 10.3390/rs11030281
[22]	GE Yun, TANG Yiling, JIANG Shunliang, et al. Region-based cascade pooling of convolutional features for HRRS image retrieval[J]. Remote Sensing Letters, 2018, 9(10): 1002–1010. doi: 10.1080/2150704X.2018.1504334
[23]	CHENG Gong, HAN Junwei, and LU Xiaoqiang. Remote sensing image scene classification: Benchmark and state of the art[J]. Proceedings of the IEEE, 2017, 105(10): 1865–1883. doi: 10.1109/JPROC.2017.2675998
[24]	孙月驰, 李冠. 基于卷积神经网络嵌套模型的人群异常行为检测[J]. 计算机应用与软件, 2019, 36(3): 196–201, 276. doi: 10.3969/j.issn.1000-386x.2019.03.036 SUN Yuechi and LI Guan. Abnormal behavior detection of crowds based on nested model of convolutional neural network[J]. Computer Applications and Software, 2019, 36(3): 196–201, 276. doi: 10.3969/j.issn.1000-386x.2019.03.036
[25]	YANG Yi and NEWSAM S. Geographic image retrieval using local invariant features[J]. IEEE Transactions on Geoscience and Remote Sensing, 2013, 51(2): 818–832. doi: 10.1109/TGRS.2012.2205158
[26]	XIA Guisong, HU Jingwen, HU Fan, et al. AID: A benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7): 3965–3981. doi: 10.1109/TGRS.2017.2685945