A Context-Aware Multiple Receptive Field Fusion Network for Oriented Object Detection in Remote Sensing Images
-
摘要: 以广距鸟瞰视角拍摄获取的遥感图像通常具有目标种类多、尺度变化大以及背景信息丰富等特点,为目标检测任务带来巨大挑战。针对遥感图像成像特点,该文设计一种上下文感知多感受野融合网络,通过充分挖掘深度网络中遥感图像在不同尺寸特征描述下所包含的上下文关联信息,提高图像特征描述力,进而提升遥感目标检测精度。首先,在特征金字塔前4层网络中构建了感受野扩张模块,通过扩大网络在不同尺度特征图上的感受野范围,增强网络对不同尺度遥感目标的感知能力;进一步,构建了高层特征聚合模块,通过将特征金字塔网络中高层语义信息聚合到低层特征中,从而将特征图中所包含的多尺度上下文信息进行有效融合;最后,在双阶段定向目标检测框架下设计了特征细化区域建议网络。通过对一阶段提案进行精细化处理,提升提案准确性,进而提高二阶段兴趣区域对齐网络得到的不同成像方向下的遥感目标检测性能。在公测数据集DIOR-R和HRSC2016上的定性和定量的对比实验结果证明,所提方法对不同种类和尺度大小的遥感目标均能实现更加准确的检测。Abstract:
Objective Recent advances in remote sensing imaging technology have made oriented object detection in remote sensing images a prominent research area in computer vision. Unlike traditional object detection tasks, remote sensing images, captured from a wide-range bird's-eye view, often contain a variety of objects with diverse scales and complex backgrounds, posing significant challenges for oriented object detection. Although current approaches have made substantial progress, existing networks do not fully exploit the contextual information across multi-scale features, resulting in classification and localization errors during detection. To address this, a context-aware multiple receptive field fusion network is proposed, which leverages the contextual correlation in multi-scale features. By enhancing the feature representation capabilities of deep networks, the accuracy of oriented object detection in remote sensing images can be improved. Methods For input remote sensing images, ResNet-50 and a feature pyramid network are first employed to extract features at different scales. The features from the first four layers are then enhanced using a receptive field expansion module. The resulting features are processed through a high-level feature aggregation module to effectively fuse multi-scale contextual information. After obtaining enhanced features at different scales, a feature refinement region proposal network is designed to revise object detection proposals using refined feature representations, resulting in more accurate candidate proposals. These multi-scale features and candidate proposals are then input into the Oriented R-CNN detection head to obtain the final object detection results. The receptive field expansion module consists of two submodules: a large selective kernel convolution attention submodule and a shift window self-attention enhancement submodule, which operate in parallel. The large selective kernel convolution submodule introduces multiple convolution operations with different kernel sizes to capture contextual information under various receptive fields, thereby improving the network’s ability to perceive multi-scale objects. The shift window self-attention enhancement submodule divides the feature map into patches according to predefined window and step sizes and calculates the self-attention-enhanced feature representation of each patch, extracting more global information from the image. The high-level feature aggregation module integrates rich semantic information from the feature pyramid network with low-level features, improving detection accuracy for multi-scale objects. Finally, a feature refinement region proposal network is designed to reduce location deviation between generated region proposals and actual rotating objects in remote sensing images. The deformable convolution is employed to capture geometric and contextual information, refining the initial proposals and producing the final oriented object detection results through a two-stage region-of-interest alignment network. Results and Discussions The effectiveness and robustness of the proposed network are demonstrated on two public datasets: DIOR-R and HRSC2016. For DIOR-R dataset, the AP50, AP75 and AP50:95 metrics are used for evaluation. Quantitative and qualitative comparisons ( Fig. 7 ) demonstrate that the proposed network significantly enhances feature representation for different remote sensing objects, distinguishing objects with similar appearances and localizing objects at various scales more accurately. For the HRSC2016 dataset, the mean Average Precision (mAP) is used, and both mAP(07) and mAP(12) are computed for quantitative comparison. The results (Fig. 7 ,Table 2 ) further highlight the network’s effectiveness in improving ship detection accuracy in remote sensing images. Additionally, ablation studies (Table 3 ) demonstrate that each module in the proposed network contributes to improved detection performance for oriented objects in remote sensing images.Conclusions This paper proposes a context-aware multi-receptive field fusion network for oriented object detection in remote sensing images. The network includes a receptive field expansion module that enhances the perception ability for remote sensing objects of different sizes. The high-level feature aggregation module fully utilizes high-level semantic information, further improving localization and classification accuracy. The feature refinement region proposal network refines the first-stage proposals, resulting in more accurate detection. The qualitative and quantitative results on the DIOR-R and HRSC2016 datasets demonstrate that the proposed network outperforms existing approaches, providing superior detection results for remote sensing objects of varying scales. -
Key words:
- Remote sensing image /
- Deep learning /
- Object detection /
- Multiple receptive field fusion
-
表 1 不同算法在DOIR-R数据集上的定量对比(%)
方法 Gliding Vertex[18] Rotated Faster RCNN[3] S2ANet[19] R3Det[20] EDA[21] QPDet[22] ABFL[23] 本文方法 APL 62.67 62.92 62.32 62.60 63.01 71.52 62.04 72.00 APO 38.56 39.94 43.38 42.98 36.87 42.01 42.54 49.49 BF 71.94 71.95 71.90 71.42 72.05 77.99 76.40 72.11 BC 81.20 81.48 81.32 81.42 81.42 81.47 85.33 81.60 BR 37.73 36.71 40.24 38.45 40.22 40.80 37.75 45.81 CH 72.48 72.54 75.37 72.63 72.26 72.64 74.34 80.51 ESA 78.62 77.35 78.17 78.81 78.04 77.36 77.97 80.67 ETS 69.04 68.75 69.63 67.60 69.98 66.69 69.29 70.14 DAM 22.81 25.31 26.47 27.51 28.63 31.84 26.78 29.94 GF 77.89 76.36 73.75 70.91 65.38 69.16 73.88 78.16 GTF 82.13 76.57 78.41 77.11 82.35 82.24 77.78 83.10 HA 46.22 45.39 41.82 39.69 44.86 42.78 43.15 46.61 OP 54.76 50.10 56.34 54.94 55.58 54.67 54.13 58.66 SH 81.03 80.93 80.99 80.26 81.03 80.90 84.97 81.19 STA 74.88 75.27 63.25 72.88 73.99 77.15 67.88 74.59 STO 62.54 62.12 69.72 61.30 62.57 62.73 70.04 62.46 TC 81.41 81.46 81.47 81.51 81.49 81.56 81.39 81.54 TS 54.25 50.25 52.40 55.72 59.83 47.77 54.63 55.88 VE 43.22 42.81 47.64 44.81 43.29 47.39 45.35 43.55 WM 65.13 63.02 64.42 64.15 64.79 64.12 65.01 66.11 $ {\text{A}}{{\text{P}}_{50}} $ 62.91 62.06 62.95 62.34 62.88 63.64 63.53 65.71 $ {\text{A}}{{\text{P}}_{75}} $ 40.00 39.55 35.85 38.82 40.02 36.79 42.68 46.72 $ {\text{A}}{{\text{P}}_{50:95}} $ 38.34 38.22 36.25 37.84 38.36 37.51 40.94 43.17 表 2 不同算法在HRSC2016数据集上的定量对比(%)
表 3 不同模块消融实验(%)
感受野扩
张模块高层特征
聚合模块特征细化区
域建议网络AP50 AP75 AP50:95 64.06 43.96 41.10 √ 64.86 46.05 42.93 √ 64.61 44.80 41.87 √ 64.17 44.68 41.69 √ √ √ 65.71 46.72 43.17 -
[1] RAO Chaofan, WANG Jiabao, CHENG Gong, et al. Learning orientation-aware distances for oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5610911. doi: 10.1109/TGRS.2023.3278933. [2] XIE Xingxing, CHENG Gong, WANG Jiabao, et al. Oriented R-CNN for object detection[C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 3520–3529. doi: 10.1109/ICCV48922.2021.00350. [3] REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031. [4] YANG Xue, YANG Jirui, YAN Junchi, et al. SCRDet: Towards more robust detection for small, cluttered and rotated objects[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019: 8232–8241. doi: 10.1109/ICCV.2019.00832. [5] LI Zhonghua, HOU Biao, WU Zitong, et al. FCOSR: A simple anchor-free rotated detector for aerial object detection[J]. Remote Sensing, 2023, 15(23): 5499. doi: 10.3390/rs15235499. [6] TIAN Yang, ZHANG Mengmeng, LI Jinyu, et al. FPNFormer: Rethink the method of processing the rotation-invariance and rotation-equivariance on arbitrary-oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5605610. doi: 10.1109/TGRS.2024.3351156. [7] MING Qi, MIAO Lingjuan, ZHOU Zhiqiang, et al. CFC-Net: A critical feature capturing network for arbitrary-oriented object detection in remote-sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5605814. doi: 10.1109/TGRS.2021.3095186. [8] REN Zhida, TANG Yongqiang, HE Zewen, et al. Ship detection in high-resolution optical remote sensing images aided by saliency information[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5623616. doi: 10.1109/TGRS.2022.3173610. [9] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90. [10] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 936–944. doi: 10.1109/CVPR.2017.106. [11] LUO Wenjie, LI Yujia, URTASUN R, et al. Understanding the effective receptive field in deep convolutional neural networks[C]. The 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 4905–4913. [12] LI Yuxuan, HOU Qibin, ZHENG Zhaohui, et al. Large selective kernel network for remote sensing object detection[C]. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023: 16748–16759. doi: 10.1109/ICCV51070.2023.01540. [13] LIU Ze, LIN Yutong, CAO Yue, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 9992–10002. doi: 10.1109/ICCV48922.2021.00986. [14] CHENG Gong, WANG Jiabao, LI Ke, et al. Anchor-free oriented proposal generator for object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5625411. doi: 10.1109/TGRS.2022.3183022. [15] LIU Zikun, WANG Hongzhen, WENG Lubin, et al. Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds[J]. IEEE Geoscience and Remote Sensing Letters, 2016, 13(8): 1074–1078. doi: 10.1109/LGRS.2016.2565705. [16] ZENG Ying, CHEN Yushi, YANG Xue, et al. ARS-DETR: Aspect ratio-sensitive detection transformer for aerial oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5610315. doi: 10.1109/TGRS.2024.3364713. [17] EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal Visual Object Classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338. doi: 10.1007/s11263-009-0275-4. [18] XU Yongchao, FU Mingtao, WANG Qimeng, et al. Gliding vertex on the horizontal bounding box for multi-oriented object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(4): 1452–1459. doi: 10.1109/TPAMI.2020.2974745. [19] HAN Jiaming, DING Jian, LI Jie, et al. Align deep features for oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5602511. doi: 10.1109/TGRS.2021.3062048. [20] YANG Xue, YAN Junchi, FENG Ziming, et al. R3Det: Refined single-stage detector with feature refinement for rotating object[C]. Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021: 3163–3171. doi: 10.1609/aaai.v35i4.16426. [21] CHEN Weining, MIAO Shencheng, WANG Guangxing, et al. Recalibrating features and regression for oriented object detection[J]. Remote Sensing, 2023, 15(8): 2134. doi: 10.3390/rs15082134. [22] YAO Yanqing, CHENG Gong, WANG Guangxing, et al. On improving bounding box representations for oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5600111. doi: 10.1109/TGRS.2022.3231340. [23] ZHAO Zifei and LI Shengyang. ABFL: Angular boundary discontinuity free loss for arbitrary oriented object detection in aerial images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5611411. doi: 10.1109/TGRS.2024.3368630. [24] XIE Xingxing, CHENG Gong, RAO Chaofan, et al. Oriented object detection via contextual dependence mining and penalty-incentive allocation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5618010. doi: 10.1109/TGRS.2024.3385985.