Advanced Search
Turn off MathJax
Article Contents
YAO Tingting, ZHAO Hengxin, FENG Zihao, HU Qing. A Context-Aware Multiple Receptive Field Fusion Network for Oriented Object Detection in Remote Sensing Images[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240560
Citation: YAO Tingting, ZHAO Hengxin, FENG Zihao, HU Qing. A Context-Aware Multiple Receptive Field Fusion Network for Oriented Object Detection in Remote Sensing Images[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240560

A Context-Aware Multiple Receptive Field Fusion Network for Oriented Object Detection in Remote Sensing Images

doi: 10.11999/JEIT240560
Funds:  The National Natural Science Foundation of China (62001078), The Fundamental Research Funds for the Central Universities (3132023249)
  • Received Date: 2024-07-04
  • Rev Recd Date: 2024-12-17
  • Available Online: 2025-01-06
  •   Objective  Recent advances in remote sensing imaging technology have made oriented object detection in remote sensing images a prominent research area in computer vision. Unlike traditional object detection tasks, remote sensing images, captured from a wide-range bird's-eye view, often contain a variety of objects with diverse scales and complex backgrounds, posing significant challenges for oriented object detection. Although current approaches have made substantial progress, existing networks do not fully exploit the contextual information across multi-scale features, resulting in classification and localization errors during detection. To address this, a context-aware multiple receptive field fusion network is proposed, which leverages the contextual correlation in multi-scale features. By enhancing the feature representation capabilities of deep networks, the accuracy of oriented object detection in remote sensing images can be improved.  Methods  For input remote sensing images, ResNet-50 and a feature pyramid network are first employed to extract features at different scales. The features from the first four layers are then enhanced using a receptive field expansion module. The resulting features are processed through a high-level feature aggregation module to effectively fuse multi-scale contextual information. After obtaining enhanced features at different scales, a feature refinement region proposal network is designed to revise object detection proposals using refined feature representations, resulting in more accurate candidate proposals. These multi-scale features and candidate proposals are then input into the Oriented R-CNN detection head to obtain the final object detection results. The receptive field expansion module consists of two submodules: a large selective kernel convolution attention submodule and a shift window self-attention enhancement submodule, which operate in parallel. The large selective kernel convolution submodule introduces multiple convolution operations with different kernel sizes to capture contextual information under various receptive fields, thereby improving the network’s ability to perceive multi-scale objects. The shift window self-attention enhancement submodule divides the feature map into patches according to predefined window and step sizes and calculates the self-attention-enhanced feature representation of each patch, extracting more global information from the image. The high-level feature aggregation module integrates rich semantic information from the feature pyramid network with low-level features, improving detection accuracy for multi-scale objects. Finally, a feature refinement region proposal network is designed to reduce location deviation between generated region proposals and actual rotating objects in remote sensing images. The deformable convolution is employed to capture geometric and contextual information, refining the initial proposals and producing the final oriented object detection results through a two-stage region-of-interest alignment network.  Results and Discussions  The effectiveness and robustness of the proposed network are demonstrated on two public datasets: DIOR-R and HRSC2016. For DIOR-R dataset, the AP50, AP75 and AP50:95 metrics are used for evaluation. Quantitative and qualitative comparisons (Fig. 7) demonstrate that the proposed network significantly enhances feature representation for different remote sensing objects, distinguishing objects with similar appearances and localizing objects at various scales more accurately. For the HRSC2016 dataset, the mean Average Precision (mAP) is used, and both mAP(07) and mAP(12) are computed for quantitative comparison. The results (Fig. 7, Table 2) further highlight the network’s effectiveness in improving ship detection accuracy in remote sensing images. Additionally, ablation studies (Table 3) demonstrate that each module in the proposed network contributes to improved detection performance for oriented objects in remote sensing images.  Conclusions  This paper proposes a context-aware multi-receptive field fusion network for oriented object detection in remote sensing images. The network includes a receptive field expansion module that enhances the perception ability for remote sensing objects of different sizes. The high-level feature aggregation module fully utilizes high-level semantic information, further improving localization and classification accuracy. The feature refinement region proposal network refines the first-stage proposals, resulting in more accurate detection. The qualitative and quantitative results on the DIOR-R and HRSC2016 datasets demonstrate that the proposed network outperforms existing approaches, providing superior detection results for remote sensing objects of varying scales.
  • loading
  • [1]
    RAO Chaofan, WANG Jiabao, CHENG Gong, et al. Learning orientation-aware distances for oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5610911. doi: 10.1109/TGRS.2023.3278933.
    [2]
    XIE Xingxing, CHENG Gong, WANG Jiabao, et al. Oriented R-CNN for object detection[C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 3520–3529. doi: 10.1109/ICCV48922.2021.00350.
    [3]
    REN Shaoqing, HE Kaiming, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. doi: 10.1109/TPAMI.2016.2577031.
    [4]
    YANG Xue, YANG Jirui, YAN Junchi, et al. SCRDet: Towards more robust detection for small, cluttered and rotated objects[C]. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019: 8232–8241. doi: 10.1109/ICCV.2019.00832.
    [5]
    LI Zhonghua, HOU Biao, WU Zitong, et al. FCOSR: A simple anchor-free rotated detector for aerial object detection[J]. Remote Sensing, 2023, 15(23): 5499. doi: 10.3390/rs15235499.
    [6]
    TIAN Yang, ZHANG Mengmeng, LI Jinyu, et al. FPNFormer: Rethink the method of processing the rotation-invariance and rotation-equivariance on arbitrary-oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5605610. doi: 10.1109/TGRS.2024.3351156.
    [7]
    MING Qi, MIAO Lingjuan, ZHOU Zhiqiang, et al. CFC-Net: A critical feature capturing network for arbitrary-oriented object detection in remote-sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5605814. doi: 10.1109/TGRS.2021.3095186.
    [8]
    REN Zhida, TANG Yongqiang, HE Zewen, et al. Ship detection in high-resolution optical remote sensing images aided by saliency information[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5623616. doi: 10.1109/TGRS.2022.3173610.
    [9]
    HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
    [10]
    LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 936–944. doi: 10.1109/CVPR.2017.106.
    [11]
    LUO Wenjie, LI Yujia, URTASUN R, et al. Understanding the effective receptive field in deep convolutional neural networks[C]. The 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 4905–4913.
    [12]
    LI Yuxuan, HOU Qibin, ZHENG Zhaohui, et al. Large selective kernel network for remote sensing object detection[C]. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023: 16748–16759. doi: 10.1109/ICCV51070.2023.01540.
    [13]
    LIU Ze, LIN Yutong, CAO Yue, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 9992–10002. doi: 10.1109/ICCV48922.2021.00986.
    [14]
    CHENG Gong, WANG Jiabao, LI Ke, et al. Anchor-free oriented proposal generator for object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5625411. doi: 10.1109/TGRS.2022.3183022.
    [15]
    LIU Zikun, WANG Hongzhen, WENG Lubin, et al. Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds[J]. IEEE Geoscience and Remote Sensing Letters, 2016, 13(8): 1074–1078. doi: 10.1109/LGRS.2016.2565705.
    [16]
    ZENG Ying, CHEN Yushi, YANG Xue, et al. ARS-DETR: Aspect ratio-sensitive detection transformer for aerial oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5610315. doi: 10.1109/TGRS.2024.3364713.
    [17]
    EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal Visual Object Classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303–338. doi: 10.1007/s11263-009-0275-4.
    [18]
    XU Yongchao, FU Mingtao, WANG Qimeng, et al. Gliding vertex on the horizontal bounding box for multi-oriented object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(4): 1452–1459. doi: 10.1109/TPAMI.2020.2974745.
    [19]
    HAN Jiaming, DING Jian, LI Jie, et al. Align deep features for oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5602511. doi: 10.1109/TGRS.2021.3062048.
    [20]
    YANG Xue, YAN Junchi, FENG Ziming, et al. R3Det: Refined single-stage detector with feature refinement for rotating object[C]. Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021: 3163–3171. doi: 10.1609/aaai.v35i4.16426.
    [21]
    CHEN Weining, MIAO Shencheng, WANG Guangxing, et al. Recalibrating features and regression for oriented object detection[J]. Remote Sensing, 2023, 15(8): 2134. doi: 10.3390/rs15082134.
    [22]
    YAO Yanqing, CHENG Gong, WANG Guangxing, et al. On improving bounding box representations for oriented object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 61: 5600111. doi: 10.1109/TGRS.2022.3231340.
    [23]
    ZHAO Zifei and LI Shengyang. ABFL: Angular boundary discontinuity free loss for arbitrary oriented object detection in aerial images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5611411. doi: 10.1109/TGRS.2024.3368630.
    [24]
    XIE Xingxing, CHENG Gong, RAO Chaofan, et al. Oriented object detection via contextual dependence mining and penalty-incentive allocation[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 5618010. doi: 10.1109/TGRS.2024.3385985.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(7)  / Tables(3)

    Article Metrics

    Article views (83) PDF downloads(13) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return