Advanced Search
Turn off MathJax
Article Contents
CHEN Xiaolei, SHEN Yujie, ZHONG Zhihua. Spherical Geometry-Guided and Frequency-Enhanced Segment Anything Model for 360° Salient Object Detection[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251254
Citation: CHEN Xiaolei, SHEN Yujie, ZHONG Zhihua. Spherical Geometry-Guided and Frequency-Enhanced Segment Anything Model for 360° Salient Object Detection[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251254

Spherical Geometry-Guided and Frequency-Enhanced Segment Anything Model for 360° Salient Object Detection

doi: 10.11999/JEIT251254 cstr: 32379.14.JEIT251254
Funds:  Item1, Item2, Item3
  • Accepted Date: 2026-01-26
  • Rev Recd Date: 2026-01-26
  • Available Online: 2026-02-12
  •   Objective  With the rapid development of VR and AR technologies and the increasing demand for omnidirectional visual applications, accurate salient object detection in complex 360° scenes has become critical for system stability and intelligent decision-making. The Segment Anything Model (SAM) demonstrates strong transferability across 2D vision tasks; however, it is primarily designed for planar images and lacks explicit modeling of spherical geometry, which limits its direct applicability to 360° salient object detection (360° SOD). To address this challenge, this work explores integrating SAM’s generalization capability with spherical-aware multi-scale geometric modeling to advance 360° SOD. Specifically, a Multi-Cognitive Adapter (MCA), Spherical Geometry-Guided Attention (SGGA), and Spatial-Frequency Joint Perception Module (SFJPM) are introduced to enhance multi-scale structural representation, alleviate projection-induced geometric distortions and boundary discontinuities, and strengthen joint global–local feature modeling.  Methods  The proposed 360° SOD framework is built upon SAM and consists of an image encoder and a mask decoder. During encoding, spherical geometry modeling is incorporated into patch embedding by mapping image patches onto a unit sphere and explicitly modeling spatial relationships between patch centers, injecting geometric priors into the attention mechanism. This design enhances sensitivity to non-uniform geometric characteristics and mitigates information loss caused by omnidirectional projection distortions. The encoder adopts a partial freezing strategy and is organized into four stages, each containing three encoder blocks. Each block integrates MCA for multi-scale contextual fusion and SGGA to model long-range dependencies in spherical space. Multi-level features are concatenated along the channel dimension to form a unified representation, which is further enhanced by the SFJPM to jointly capture spatial structures and frequency-domain global information. The fused features are then fed into the SAM mask decoder, where saliency maps are optimized under ground-truth supervision to achieve accurate localization and boundary refinement.  Results and Discussions  Experiments are conducted using the PyTorch framework on an RTX 3090 GPU with an input resolution of 512×512. Evaluations on two public datasets (360-SOD and 360-SSOD) against 14 state-of-the-art methods demonstrate that the proposed approach consistently achieves superior performance across six evaluation metrics. On the 360-SOD dataset, the model attains an MAE of 0.0152 and a maximum F-measure of 0.8492, outperforming representative methods such as MDSAM and DPNet. Qualitative results show that the proposed method produces saliency maps highly consistent with ground-truth annotations, effectively handling challenging scenarios including projection distortion, boundary discontinuity, multi-object scenes, and complex backgrounds. Ablation studies further confirm that MCA, SGGA, and SFJPM contribute independently while complementing each other to improve detection performance.  Conclusions  This paper proposes a novel SAM-based framework for 360° salient object detection that jointly addresses multi-scale representation, spherical distortion awareness, and spatial-frequency feature modeling. The MCA enables efficient multi-scale feature fusion, SGGA explicitly compensates for ERP-induced geometric distortions, and SFJPM enhances long-range dependency modeling. Extensive experiments validate the effectiveness and feasibility of introducing SAM into 360° SOD. Future work will extend this framework to omnidirectional video and multi-modal scenarios to further improve spatiotemporal modeling and scene understanding.
  • loading
  • [1]
    CUI Ruikai, HE Siyuan, and QIU Shi. Adaptive low rank adaptation of segment anything to salient object detection[J]. arXiv preprint arXiv: 2308.05426, 2023. doi: 10.48550/arXiv.2308.05426. (查阅网上资料,不确定本文献类型是否正确,请确认).
    [2]
    GAO Shixuan, ZHANG Pingping, YAN Tianyu, et al. Multi-scale and detail-enhanced segment anything model for salient object detection[C]. Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, Australia, 2024: 9894–9903. doi: 10.1145/3664647.3680650.
    [3]
    LIU Zhengyi, DENG Sheng, WANG Xinrui, et al. SSFam: Scribble supervised salient object detection family[J]. IEEE Transactions on Multimedia, 2025, 27: 1988–2000. doi: 10.1109/TMM.2025.3543092.
    [4]
    WEI Jun, WANG Shuhui, WU Zhe, et al. Label decoupling framework for salient object detection[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 13022–13031. doi: 10.1109/CVPR42600.2020.01304.
    [5]
    LIU Nian, ZHANG Ni, WAN Kaiyuan, et al. Visual saliency transformer[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, Canada, 2021: 4702–4712. doi: 10.1109/ICCV48922.2021.00468.
    [6]
    LI Gongyang, BAI Zhen, LIU Zhi, et al. Salient object detection in optical remote sensing images driven by transformer[J]. IEEE Transactions on Image Processing, 2023, 32: 5257–5269. doi: 10.1109/TIP.2023.3314285.
    [7]
    ZHU Jiayi, QIN Xuebin, and ELSADDIK A. DC-Net: Divide-and-conquer for salient object detection[J]. Pattern Recognition, 2025, 157: 110903. doi: 10.1016/j.patcog.2024.110903.
    [8]
    XIE Chenxi, XIA Changqun, MA Mingcan, et al. Pyramid grafting network for one-stage high resolution saliency detection[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 11707–11716. doi: 10.1109/CVPR52688.2022.01142.
    [9]
    MA Mingcan, XIA Changqun, XIE Chenxi, et al. Boosting broader receptive fields for salient object detection[J]. IEEE Transactions on Image Processing, 2023, 32: 1026–1038. doi: 10.1109/TIP.2022.3232209.
    [10]
    KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023: 3992–4003. doi: 10.1109/ICCV51070.2023.00371.
    [11]
    HOULSBY N, GIURGIU A, JASTRZEBSKI S, et al. Parameter-efficient transfer learning for NLP[C]. Proceedings of the 36th International Conference on Machine Learning, Long Beach, USA, 2019: 2790–2799.
    [12]
    HU E, Shen Yelong, WALLIS P, et al. LoRA: Low-rank adaptation of large language models[C]. 10th International Conference on Learning Representations, OpenReview. net, 2022. (查阅网上资料, 未找到本条文献出版地信息, 请确认).
    [13]
    ZHENG Linghao, PU Xinyang, ZHANG Su, et al. Tuning a SAM-based model with multicognitive visual adapter to remote sensing instance segmentation[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025, 18: 2737–2748. doi: 10.1109/JSTARS.2024.3504409.
    [14]
    LI Jia, SU Jinming, XIA Changqun, et al. Distortion-adaptive salient object detection in 360° omnidirectional images[J]. IEEE Journal of Selected Topics in Signal Processing, 2020, 14(1): 38–48. doi: 10.1109/JSTSP.2019.2957982.
    [15]
    MA Guangxiao, LI Shuai, CHEN Chenglizhao, et al. Stage-wise salient object detection in 360° omnidirectional image via object-level semantical saliency ranking[J]. IEEE Transactions on Visualization and Computer Graphics, 2020, 26(12): 3535–3545. doi: 10.1109/TVCG.2020.3023636.
    [16]
    WEN Hongfa, ZHU Zunjie, ZHOU Xiaofei, et al. Consistency perception network for 360° omnidirectional salient object detection[J]. Neurocomputing, 2025, 620: 129243. doi: 10.1016/j.neucom.2024.129243.
    [17]
    ZHANG Yi, HAMIDOUCHE W, and DEFORGES O. Channel-spatial mutual attention network for 360° salient object detection[C]. 2022 26th International Conference on Pattern Recognition, Montreal, Canada, 2022: 3436–3442. doi: 10.1109/ICPR56361.2022.9956354.
    [18]
    HUANG Mengke, LIU Zhi, LI Gongyang, et al. FANet: Features adaptation network for 360° omnidirectional salient object detection[J]. IEEE Signal Processing Letters, 2020, 27: 1819–1823. doi: 10.1109/LSP.2020.3028192.
    [19]
    CONG Runmin, HUANG Ke, LEI Jianjun, et al. Multi-projection fusion and refinement network for salient object detection in 360° omnidirectional image[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(7): 9495–9507. doi: 10.1109/TNNLS.2022.3233883.
    [20]
    ZHANG Jie, ZHANG Qiudan, SHEN Xuelin, et al. Salient object detection on 360° omnidirectional image with bi-branch hybrid projection network[C]. 2023 IEEE 25th International Workshop on Multimedia Signal Processing (MMSP), Poitiers, France, 2023: 1–5. doi: 10.1109/MMSP59012.2023.10337695.
    [21]
    HE Zhentao, SHAO Feng, CHEN Gang, et al. SCFANet: Semantics and context feature aggregation network for 360° salient object detection[J]. IEEE Transactions on Multimedia, 2024, 26: 2276–2288. doi: 10.1109/TMM.2023.3293994.
    [22]
    HE Zhentao, SHAO Feng, XIE Zhengxuan, et al. SIHENet: Semantic interaction and hierarchical embedding network for 360° salient object detection[J]. IEEE Transactions on Instrumentation and Measurement, 2025, 74: 5003815. doi: 10.1109/TIM.2024.3507047.
    [23]
    HUANG Mengke, LI Gongyang, LIU Zhi, et al. Lightweight distortion-aware network for salient object detection in omnidirectional images[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(10): 6191–6197. doi: 10.1109/TCSVT.2023.3253685.
    [24]
    ZHAO Yinjie, ZHAO Lichen, YU Qian, et al. Distortion-aware transformer in 360° salient object detection[C]. Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, Canada, 2023: 499–508. doi: 10.1145/3581783.3612025.
    [25]
    陈晓雷, 张学功, 杜泽龙, 等. 面向360°全景图像显著目标检测的畸变语义聚合网络[J]. 中国图象图形学报, 2025, 30(7): 2451–2467. doi: 10.11834/jig.240371.

    CHEN Xiaolei, ZHANG Xuegong, DU Zelong, et al. Distortion semantic aggregation network for salient object detection in 360° omnidirectional images[J]. Journal of Image and Graphics, 2025, 30(7): 2451–2467. doi: 10.11834/jig.240371.
    [26]
    陈晓雷, 杜泽龙, 张学功, 等. 畸变自适应与位置感知的360°全景图像显著目标检测网络[J]. 中国图象图形学报, 2025, 30(8): 2758–2774. doi: 10.11834/jig.240592.

    CHEN Xiaolei, DU Zelong, ZHANG Xuegong, et al. Distortion-adaptive and position-aware network for salient object detection in 360° omnidirectional image[J]. Journal of Image and Graphics, 2025, 30(8): 2758–2774. doi: 10.11834/jig.240592.
    [27]
    WU Junjie, XIA Changqun, YU Tianshu, et al. View-aware salient object detection for 360° omnidirectional image[J]. IEEE Transactions on Multimedia, 2023, 25: 6471–6484. doi: 10.1109/TMM.2022.3209015.
    [28]
    DAI Haowei, BAO Liuxin, SHEN Kunye, et al. 360° omnidirectional salient object detection with multi-scale interaction and densely-connected prediction[C]. Proceedings of 12th International Conference on Image and Graphics, Nanjing, China, 2023: 427–438. doi: 10.1007/978-3-031-46305-1_35.
    [29]
    CHEN Gang, SHAO Feng, CHAI Xiongli, et al. Multi-stage salient object detection in 360° omnidirectional image using complementary object-level semantic information[J]. IEEE Transactions on Emerging Topics in Computational Intelligence, 2024, 8(1): 776–789. doi: 10.1109/TETCI.2023.3259433.
    [30]
    陈晓雷, 王兴, 张学功, 等. 面向360度全景图像显著目标检测的相邻协调网络[J]. 电子与信息学报, 2024, 46(12): 4529–4541. doi: 10.11999/JEIT240502.

    CHEN Xiaolei, WANG Xing, ZHANG Xuegong, et al. Adjacent coordination network for salient object detection in 360 degree omnidirectional images[J]. Journal of Electronics & Information Technology, 2024, 46(12): 4529–4541. doi: 10.11999/JEIT240502.
    [31]
    YUN I, SHIN C, LEE H, et al. EGformer: Equirectangular geometry-biased transformer for 360 depth estimation[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2023: 6078–6089. doi: 10.1109/ICCV51070.2023.00561.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(10)  / Tables(5)

    Article Metrics

    Article views (31) PDF downloads(2) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return