Advanced Search
Turn off MathJax
Article Contents
ZHU Lei, YUAN Jinyao, WANG Wenwu, CAI Xiaoman. Saliency Object Detection Utilizing Adaptive Convolutional Attention and Mask Structure[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240431
Citation: ZHU Lei, YUAN Jinyao, WANG Wenwu, CAI Xiaoman. Saliency Object Detection Utilizing Adaptive Convolutional Attention and Mask Structure[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT240431

Saliency Object Detection Utilizing Adaptive Convolutional Attention and Mask Structure

doi: 10.11999/JEIT240431
  • Received Date: 2024-05-13
  • Rev Recd Date: 2024-09-18
  • Available Online: 2024-09-24
  • Salient Object Detection (SOD) aims to mimic the human visual system’s attention and cognitive mechanisms to automatically extract prominent objects from a scene. While existing Convolutional Neural Network (CNN)- or Transformer-based models continuously advance performance in this field, there is less research addressing two specific issues: (1) Most methods in this domain commonly use a pixel-wise dense prediction approach to obtain pixel saliency values. However, this approach does not align with the human visual system’s scene analysis mechanism, where the human eye usually performs a holistic analysis of semantic regions rather than focusing on pixel-level information. (2) Enhancing contextual information association is widely emphasized in SOD tasks, but acquiring long-range contextual features through a Transformer backbone does not necessarily offer advantages. SOD should focus more on the center-neighborhood differences within appropriate regions rather than global long-range dependencies. To address these issues, we propose a novel salient object detection model that integrates CNN-based adaptive attention and masked attention into the network to enhance SOD performance. The proposed algorithm designs a mask-aware decoding module that perceives image features by restricting cross-attention to the predicted mask region, helping the network better focus on the entire region of salient objects. Additionally, we design a convolutional attention-based contextual feature enhancement module, which, unlike Transformers that establish long-range relationships layer by layer, captures appropriate contextual associations only in the highest-level features, avoiding the introduction of irrelevant global information. We conducted experimental evaluations on four widely used datasets, and the results demonstrate that our proposed method achieves significant performance improvements across different scenarios, showcasing good generalization ability and stability.
  • loading
  • [1]
    ZHOU Huajun, XIE Xiaohua, LAI Jianhuang, et al. Interactive two-stream decoder for accurate and fast saliency detection[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 9138–9147. doi: 10.1109/CVPR42600.2020.00916.
    [2]
    LIANG Pengpeng, PANG Yu, LIAO Chunyuan, et al. Adaptive objectness for object tracking[J]. IEEE Signal Processing Letters, 2016, 23(7): 949–953. doi: 10.1109/LSP.2016.2556706.
    [3]
    RUTISHAUSER U, WALTHER D, KOCH C, et al. Is bottom-up attention useful for object recognition?[C]. 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, USA, 2004: II-II. doi: 10.1109/CVPR.2004.1315142.
    [4]
    ZHANG Jing, FAN Dengping, DAI Yuchao, et al. RGB-D saliency detection via cascaded mutual information minimization[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 4318–4327. doi: 10.1109/ICCV48922.2021.00430.
    [5]
    LI Aixuan, MAO Yuxin, ZHANG Jing, et al. Mutual information regularization for weakly-supervised RGB-D salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(1): 397–410. doi: 10.1109/TCSVT.2023.3285249.
    [6]
    LIAO Guibiao, GAO Wei, LI Ge, et al. Cross-collaborative fusion-encoder network for robust RGB-thermal salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2022, 32(11): 7646–7661. doi: 10.1109/TCSVT.2022.3184840.
    [7]
    CHEN Yilei, Li Gongyang, AN Ping, et al. Light field salient object detection with sparse views via complementary and discriminative interaction network[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34(2): 1070–1085. doi: 10.1109/TCSVT.2023.3290600.
    [8]
    ITTI L, KOCH C, and NIEBUR E. A model of saliency-based visual attention for rapid scene analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(11): 1254–1259. doi: 10.1109/34.730558.
    [9]
    JIANG Huaizu, WANG Jingdong, YUAN Zejian, et al. Salient object detection: A discriminative regional feature integration approach[C]. 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, USA, 2013: 2083–2090. doi: 10.1109/CVPR.2013.271.
    [10]
    LI Guanbin and YU Yizhou. Visual saliency based on multiscale deep features[C]. 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, 2015: 5455–5463. doi: 10.1109/CVPR.2015.7299184.
    [11]
    LEE G, TAI Y W, and KIM J. Deep saliency with encoded low level distance map and high level features[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 660–668. doi: 10.1109/CVPR.2016.78.
    [12]
    WANG Linzhao, WANG Lijun, LU Huchuan, et al. Salient object detection with recurrent fully convolutional networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(7): 1734–1746. doi: 10.1109/TPAMI.2018.2846598.
    [13]
    LIU Nian, ZHANG Ni, WAN Kaiyuan, et al. Visual saliency transformer[C]. 2021 IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 4702–4712. doi: 10.1109/ICCV48922.2021.00468.
    [14]
    YUN Yike and LIN Weisi. SelfReformer: Self-refined network with transformer for salient object detection[J]. arXiv: 2205.11283, 2022.
    [15]
    ZHU Lei, CHEN Jiaxing, HU Xiaowei, et al. Aggregating attentional dilated features for salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30(10): 3358–3371. doi: 10.1109/TCSVT.2019.2941017.
    [16]
    XIE Enze, WANG Wenhai, YU Zhiding, et al. SegFormer: Simple and efficient design for semantic segmentation with transformers[C]. The 35th International Conference on Neural Information Processing Systems, 2021: 924.
    [17]
    WANG Libo, LI Rui, ZHANG Ce, et al. UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 190: 196–214. doi: 10.1016/j.isprsjprs.2022.06.008.
    [18]
    ZHOU Daquan, KANG Bingyi, JIN Xiaojie, et al. DeepViT: Towards deeper vision transformer[J]. arXiv: 2103.11886, 2021.
    [19]
    GAO Shanghua, CHENG Mingming, ZHAO Kai, et al. Res2Net: A new multi-scale backbone architecture[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(2): 652–662. doi: 10.1109/TPAMI.2019.2938758.
    [20]
    LIN Xian, YAN Zengqiang, DENG Xianbo, et al. ConvFormer: Plug-and-play CNN-style transformers for improving medical image segmentation[C]. The 26th International Conference on Medical Image Computing and Computer-Assisted Intervention, Vancouver, Canada, 2023: 642–651. doi: 10.1007/978-3-031-43901-8_61.
    [21]
    CHENG Bowen, MISRA I, SCHWING A G, et al. Masked-attention mask transformer for universal image segmentation[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 1280–1289. doi: 10.1109/CVPR52688.2022.00135.
    [22]
    ZHAO Jiaxing, LIU Jiangjiang, FAN Dengping, et al. EGNet: Edge guidance network for salient object detection[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 8778–8787. doi: 10.1109/ICCV.2019.00887.
    [23]
    LIU Jiangjiang, HOU Qibin, CHENG Mingming, et al. A simple pooling-based design for real-time salient object detection[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 3912–3921. doi: 10.1109/CVPR.2019.00404.
    [24]
    PANG Youwei, ZHAO Xiaoqi, ZHANG Lihe, et al. Multi-scale interactive network for salient object detection[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 9410–9419. doi: 10.1109/CVPR42600.2020.00943.
    [25]
    HU Xiaowei, FU C, ZHU Lei, et al. SAC-Net: Spatial attenuation context for salient object detection[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(3): 1079–1090. doi: 10.1109/TCSVT.2020.2995220.
    [26]
    ZHUGE Mingchen, FAN Dengping, LIU Nian, et al. Salient object detection via integrity learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3738–3752. doi: 10.1109/TPAMI.2022.3179526.
    [27]
    WANG Yi, WANG Ruili, FAN Xin, et al. Pixels, regions, and objects: Multiple enhancement for salient object detection[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 10031–10040. doi: 10.1109/CVPR527292023.00967.
    [28]
    LUO Ziyang, LIU Nian, ZHAO Wangbo, et al. VSCode: General visual salient and camouflaged object detection with 2D prompt learning[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 17169–17180. doi: 10.1109/CVPR52733.2024.01625.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(3)

    Article Metrics

    Article views (123) PDF downloads(23) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return