基于双重注意力机制的遥感图像场景分类特征表示方法

徐从安; 吕亚飞; 张筱晗; 刘瑜; 崔晨浩; 顾祥岐

doi:10.11999/JEIT200568

基于双重注意力机制的遥感图像场景分类特征表示方法

doi: 10.11999/JEIT200568 cstr: 32379.14.JEIT200568

徐从安^{1, 2},
吕亚飞^{1, 3, ,},
张筱晗^{1, 4},
刘瑜¹,
崔晨浩⁵,
顾祥岐¹

1.
海军航空大学信息融合研究所烟台 264000
2.
清华大学电子工程系北京 100084
3.
91977部队北京 100089
4.
61646部队北京 100089
5.
32144部队渭南 714000

基金项目: 国家自然科学基金(61790550, 61790554, 61531020, 61671463)

详细信息

作者简介:
徐从安：男，1987年生，博士，研究方向为遥感图像智能处理、多目标跟踪

吕亚飞：男，1992年生，博士，研究方向为遥感图像智能处理、跨模态检索

张筱晗：女，1992年生，博士，研究方向为遥感图像智能处理、目标检测

刘瑜：男，1986年生，副教授，研究方向为智能数据处理

崔晨浩：男，1991年生，研究方向为雷达数据处理

顾祥岐：男，1995年生，博士生，研究方向为雷达数据处理、信息融合

通讯作者:
吕亚飞　YFei_Lv@163.com, xcatougao@163.com

中图分类号: TP751.1; TP183
计量
- 文章访问数: 2055
- HTML全文浏览量: 1045
- PDF下载量: 153
- 被引次数: 0
出版历程
- 收稿日期: 2020-07-10
- 修回日期: 2020-12-07
- 网络出版日期: 2020-12-15
- 刊出日期: 2021-03-22

A Discriminative Feature Representation Method Based on Dual Attention Mechanism for Remote Sensing Image Scene Classification

Cong'an XU^{1, 2},
Yafei LÜ^{1, 3
, ,},
Xiaohan ZHANG^{1, 4},
Yu LIU¹,
Chenhao CUI⁵,
Xiangqi GU¹

1.
Information Fusion Institute, Naval Aviation University, Yantai 264000, China
2.
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
3.
91977 Troop, Beijing 100089, China
4.
61646 Troop, Beijing 100089, China
5.
32144 Troop, Weinan, 714000, China

Funds: The National Natural Science Foundation of China (61790550, 61790554, 61531020, 61671463)

摘要

摘要: 针对遥感图像场景分类面临的类内差异性大、类间相似性高导致的部分场景出现分类混淆的问题，该文提出了一种基于双重注意力机制的强鉴别性特征表示方法。针对不同通道所代表特征的重要性程度以及不同局部区域的显著性程度不同，在卷积神经网络提取的高层特征基础上，分别设计了一个通道维和空间维注意力模块，利用循环神经网络的上下文信息提取能力，依次学习、输出不同通道和不同局部区域的重要性权重，更加关注图像中的显著性特征和显著性区域，而忽略非显著性特征和区域，以提高特征表示的鉴别能力。所提双重注意力模块可以与任意卷积神经网络相连，整个网络结构可以端到端训练。通过在两个公开数据集AID和NWPU45上进行大量的对比实验，验证了所提方法的有效性，与现有方法对比，分类准确率取得了明显的提升。
- 遥感图像处理 /
- 场景分类 /
- 注意力机制 /
- 特征表示
Abstract: Considering the problem of low classification accuracy caused by large intra-class differences and high inter-class similarity in remote sensing image scene classification, a discriminative feature representation method based on dual attention mechanism is proposed. Due to the difference in the importance of the features contained in different channels and the significance of different local regions, the channel-wise and spatial-wise attention module are designed, based on the high-level features extracted by the Convolutional Neural Networks. Relying on the ability to extract contextual information, the Recurrent Neural Network is adopted to learn and output the importance weights of different channels and different local regions, paying more attention to the salient features and salient regions, while ignoring non-salience features and regions, to enhance the discriminative ability of feature representation. The proposed dual attention module can be connected to the last convolutional layer of any convolutional neural network, and the network structure can be trained end-to-end. Comparative experiments are conducted on the two public data sets AID and NWPU45. Compared with the existing methods, the classification accuracy has been significantly improved, and the effectiveness of the proposed method can be verified.
- Remote sensing image process /
- Scene classification /
- Attention mechanism /
- Feature representation

HTML全文

图 1 本文算法框架图

下载: 全尺寸图片幻灯片

图 2 通道维注意力模块网络结构图

下载: 全尺寸图片幻灯片

图 3 空间维注意力模块网络结构图

下载: 全尺寸图片幻灯片

图 4 数据集AID下所提方法的混淆矩阵图

下载: 全尺寸图片幻灯片

图 5 数据集AID在所提方法中的误判实例

下载: 全尺寸图片幻灯片

表 1 数据集AID和NWPU45下的模型简化测试OA(%)结果对比表

方法	AID		NWPU45
方法	20%	50%	10%	20%
VGG16	86.59±0.29	89.64±0.30	87.15±0.45	90.36±0.18
VGG16+CA	87.73±0.19	89.98±0.25	88.54±0.39	90.89±0.23
VGG16+SA	89.36±0.21	94.06±0.19	93.23±0.21	95.05±0.18
VGG16+CA+SA	89.87±0.30	94.58±0.23	97.89±0.12	98.82±0.20
ResNet50	86.48±0.49	89.22±0.34	89.88±0.26	92.35±0.19
ResNet50+CA	88.23±0.34	91.45±0.30	91.52±0.19	93.48±0.21
ResNet50+SA	90.83±0.55	94.46±0.48	97.56±0.08	98.79±0.04
ResNet50+CA+SA	91.34±0.38	95.22±0.36	98.55±0.11	99.07±0.23

下载: 导出CSV

表 2 数据集AID下所提方法与其他基准方法的OA(%)结果对比表

方法	年份	AID
方法	年份	20%	50%
VGG16 ^[16]	2017	86.59±0.29	89.64±0.30
CaffeNet ^[16]	2017	86.86±0.47	89.53±0.31
GoogLeNet ^[16]	2017	83.44±0.40	86.39±0.55
Fusion-by-add ^[19]	2017		91.87±0.36
MCNN ^[11]	2018		91.80±0.22
ARCNet ^[12]	2019	88.75±0.40	93.10±0.55
Finetune_ResNet50^[14]	2019	86.48±0.49	89.22±0.34
ResNet_LGFFE ^[14]	2019	90.83±0.55	94.46±0.48
VGG16+CA+SA	本文方法	89.87±0.30	94.58±0.23
ResNet50+CA+SA	本文方法	91.34±0.38	95.22±0.36

下载: 导出CSV

表 3 数据集NWPU45下所提方法与其他基准方法的OA(%)结果对比表

方法	年份	NWPU45
方法	年份	10%	20%
AlexNet ^[17]	2017	81.22±0.19	85.16±0.18
VGG_16 ^[17]	2017	87.15±0.45	90.36±0.18
GoogleNet ^[17]	2017	86.02±0.18	86.02±0.18
D_CNN ^[11]	2018	89.22±0.5	91.89±0.22
LGFF ^[20]	2018	93.61±0.1	96.37±0.05
文献[21]	2019	91.73±0.21	93.47±0.30
Finetune_ResNet50^[14]	2019	89.88±0.26	92.35±0.19
ResNet_LGFFE^[14]	2019	97.56±0.08	98.79±0.04
VGG16+CA+SA	本文方法	97.89±0.12	98.82±0.20
ResNet50+CA+SA	本文方法	98.55±0.11	99.07±0.23

下载: 导出CSV

参考文献(21)

CHI Mingmin, PLAZA A, BENEDIKTSSON J A, et al. Big data for remote sensing: Challenges and opportunities[J]. Proceedings of the IEEE, 2016, 104(11): 2207–2219. doi: 10.1109/JPROC.2016.2598228

ZHANG Liangpei, ZHANG Lefei, and DU Bo. Deep learning for remote sensing data: A technical tutorial on the state of the art[J]. IEEE Geoscience and Remote Sensing Magazine, 2016, 6(4): 22–40. doi: 10.1109/MGRS.2016.2540798

CHENG Gong, MA Chengcheng, ZHOU Peicheng, et al. Scene classification of high resolution remote sensing images using convolutional neural networks[C]. 2016 IEEE International Geoscience and Remote Sensing Symposium, Beijing, China, 2016: 767–770. doi: 10.1109/IGARSS.2016.7729193.

SZEGEDY C, LIU Wei, JIA Yangqing, et al. Going deeper with convolutions[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 1–9. doi: 10.1109/CVPR.2015.7298594.

HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.

KRIZHEVSKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[C]. The 25th International Conference on Neural Information Processing Systems, Lake Tahoe, USA, 2012: 1097–1105.

HU Fan, XIA Guisong, YANG Wen, et al. Recent advances and opportunities in scene classification of aerial images with deep models[C]. 2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 2018: 4371–4374. doi: 10.1109/IGARSS.2018.8518336.

CHENG Gong, YANG Ceyuan, YAO Xiwen, et al. When deep learning meets metric learning: Remote sensing image scene classification via learning discriminative CNNs[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(5): 2811–2821. doi: 10.1109/TGRS.2017.2783902

LI Peng, REN Peng, ZHANG Xiaoyu, et al. Region-wise deep feature representation for remote sensing images[J]. Remote Sensing, 2018, 10(6): 871. doi: 10.3390/rs10060871

LIU Yanfei, ZHONG Yanfei, and QIN Qianqing. Scene classification based on multiscale convolutional neural network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2018, 56(12): 7109–7121. doi: 10.1109/TGRS.2018.2848473

YUAN Yuan, FANG Jie, LU Xiaoqiang, et al. Remote sensing image scene classification using rearranged local features[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(3): 1779–1792. doi: 10.1109/TGRS.2018.2869101

WANG Qi, LIU Shaoteng, CHANUSSOT J, et al. Scene classification with recurrent attention of VHR remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(2): 1155–1167. doi: 10.1109/TGRS.2018.2864987

XIONG Wei, LV Yafei, CUI Yaqi, et al. A discriminative feature learning approach for remote sensing image retrieval[J]. Remote Sensing, 2019, 11(3): 281. doi: 10.3390/rs11030281

LV Yafei, ZHANG Xiaohan, XIONG Wei, et al. An end-to-end local-global-fusion feature extraction network for remote sensing image scene classification[J]. Remote Sensing, 2019, 11(24): 3006. doi: 10.3390/rs11243006

CHO K, VAN MERRIËNBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation[C]. 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 2014: 1724–1734. doi: 10.3115/v1/D14-1179.

XIA Guisong, HU Jingwen, HU Fan, et al. AID: A benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7): 3965–3981. doi: 10.1109/TGRS.2017.2685945

CHENG Gong, HAN Junwei, and LU Xiaoqiang. Remote sensing image scene classification: Benchmark and state of the art[J]. Proceedings of the IEEE, 2017, 105(10): 1865–1883. doi: 10.1109/JPROC.2017.2675998

SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]. The 3rd International Conference on Learning Representations, San Diego, USA, 2015: 7–12.

CHAIB S, LIU Huan, GU Yanfeng, et al. Deep feature fusion for VHR remote sensing scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(8): 4775–4784. doi: 10.1109/TGRS.2017.2700322

ZHU Qiqi, ZHONG Yanfei, LIU Yanfei, et al. A deep-local-global feature fusion framework for high spatial resolution imagery scene classification[J]. Remote Sensing, 2018, 10(4): 568. doi: 10.3390/rs10040568

叶利华, 王磊, 张文文, 等. 高分辨率光学遥感场景分类的深度度量学习方法[J]. 测绘学报, 2019, 48(6): 698–707. doi: 10.11947/j.AGCS.2019.20180434

YE Lihua, WANG Lei, ZHANG Wenwen, et al. Deep metric learning method for high resolution remote sensing image scene classification[J]. Acta Geodaetica et Cartographica Sinica, 2019, 48(6): 698–707. doi: 10.11947/j.AGCS.2019.20180434

施引文献

资源附件(0)

访问统计