高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于鲁棒视觉变换和多注意力的全景图像显著性检测

陈晓雷 张鹏程 卢禹冰 曹宝宁

陈晓雷, 张鹏程, 卢禹冰, 曹宝宁. 基于鲁棒视觉变换和多注意力的全景图像显著性检测[J]. 电子与信息学报, 2023, 45(6): 2246-2255. doi: 10.11999/JEIT220684
引用本文: 陈晓雷, 张鹏程, 卢禹冰, 曹宝宁. 基于鲁棒视觉变换和多注意力的全景图像显著性检测[J]. 电子与信息学报, 2023, 45(6): 2246-2255. doi: 10.11999/JEIT220684
CHEN Xiaolei, ZHANG Pengcheng, LU Yubing, CAO Baoning. Saliency Detection of Panoramic Images Based on Robust Vision Transformer and Multiple Attention[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2246-2255. doi: 10.11999/JEIT220684
Citation: CHEN Xiaolei, ZHANG Pengcheng, LU Yubing, CAO Baoning. Saliency Detection of Panoramic Images Based on Robust Vision Transformer and Multiple Attention[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2246-2255. doi: 10.11999/JEIT220684

基于鲁棒视觉变换和多注意力的全景图像显著性检测

doi: 10.11999/JEIT220684
基金项目: 国家自然科学基金(61967012)
详细信息
    作者简介:

    陈晓雷:男,博士,副教授,研究方向为人工智能、计算机视觉、虚拟现实

    张鹏程:男,硕士生,研究方向为图像显著性检测

    卢禹冰:男,硕士生,研究方向为图像处理、姿态估计

    曹宝宁:男,硕士生,研究方向为虚拟现实、图像处理、人工智能

    通讯作者:

    陈晓雷 chenxl703@lut.edu.cn

  • 中图分类号: TN911.73; TP391

Saliency Detection of Panoramic Images Based on Robust Vision Transformer and Multiple Attention

Funds: The National Natural Science Foundation of China (61967012)
  • 摘要: 针对当前全景图像显著性检测方法存在检测精度偏低、模型收敛速度慢和计算量大等问题,该文提出一种基于鲁棒视觉变换和多注意力的U型网络(URMNet)模型。该模型使用球形卷积提取全景图像的多尺度特征,减轻了全景图像经等矩形投影后的失真。使用鲁棒视觉变换模块提取4种尺度特征图所包含的显著信息,采用卷积嵌入的方式降低特征图的分辨率,增强模型的鲁棒性。使用多注意力模块,根据空间注意力与通道注意力间的关系,有选择地融合多维度注意力。最后逐步融合多层特征,形成全景图像显著图。纬度加权损失函数使该文模型具有更快的收敛速度。在两个公开数据集上的实验表明,该文所提模型因使用了鲁棒视觉变换模块和多注意力模块,其性能优于其他6种先进方法,能进一步提高全景图像显著性检测精度。
  • 图  1  URMNet示意图

    图  2  变换器模块示意图

    图  3  特征图预处理示意图

    图  4  多通道自注意力示意图

    图  5  SAM示意图

    图  6  CAM示意图

    图  7  本文方法与其他方法在AAOI数据集上的可视化对比

    图  8  本文方法与其他方法在ASalient360数据集上的可视化对比

    图  9  添加噪声后的显著性检测图像放大可视化对比

    表  1  不同加权因子的实验结果

    SMA $ \beta $CAM ${{1 - } }\beta$CC↑SIM↑KLDiv↓NSS↑AUC_Judd↑AUC_Borji↑Grade
    01.00.90050.77870.19703.53460.99140.97553.3809
    0.20.80.88030.77850.44033.49640.98610.96180.7047
    0.40.60.90080.79210.50703.71900.99360.97953.4726
    0.50.50.90670.81190.21983.28490.98980.97723.5050
    0.60.40.89120.78710.23503.22120.98400.96551.2312
    0.80.20.87710.75610.23173.28930.98640.97561.1757
    1.000.90230.78900.37753.67370.99190.97743.4871
    下载: 导出CSV

    表  2  AAOI数据集上各模型客观指标对比

    方法CC↑SIM↑KLDiv↓NSS↑AUC_Judd↑AUC_Borji↑
    URMNet0.89340.79180.17873.71130.98650.9707
    U-Net(2015)[16]0.85500.76940.26472.96390.97410.9582
    AttU-Net(2018)[17]0.86630.76750.23593.32120.97960.9632
    Spherical U-Net(2018)[3]0.78320.73040.31672.47950.94670.9295
    panoramic CNN(2020)[5]0.85200.75330.24123.19990.97780.9641
    QAU-Net(2021)[18]0.73140.65300.42031.86780.92260.8926
    UCTransNet(2021)[19]0.86190.77310.21053.02040.98140.9625
    下载: 导出CSV

    表  3  ASalient360数据集上各模型客观指标对比

    方法CC↑SIM↑KLDiv↓NSS↑AUC_Judd↑AUC_Borji↑
    URMNet0.66830.66020.58342.98740.94490.9336
    U-Net(2015)[16]0.60610.64040.43972.62280.91800.8973
    AttU-Net(2018)[17]0.55890.62620.48921.98400.88710.8644
    Spherical U-Net(2018)[3]0.60280.64120.67142.58340.93840.9183
    panoramic CNN(2020)[5]0.62250.65270.40492.03240.89750.8482
    QAU-Net(2021)[18]0.56410.62800.51382.75560.88890.8859
    UCTransNet(2021)[19]0.54290.61650.52462.00410.89900.8683
    下载: 导出CSV

    表  4  网络和损失函数的消融实验

    模型损失函数CC↑SIM↑KLDiv↓NSS↑AUC_Judd↑AUC_Borji↑
    BaselineURMNetL1L2L3Loss
    0.73890.69352.20791.51070.86340.8310
    0.73000.68862.26141.50020.86360.8169
    0.66290.66041.29391.23690.81840.7806
    0.86040.76180.39063.04630.98460.9614
    0.74560.70072.20681.50430.86270.8304
    0.75650.68452.94971.68480.89090.8166
    0.75570.69093.10001.66070.88300.8465
    0.90670.81190.21983.28490.98980.9772
    下载: 导出CSV

    表  5  RVT和MA模块消融实验结果

    RVTMACC↑SIM↑KLDiv↓NSS↑AUC_Judd↑AUC_Borji↑
    ××0.83350.73920.29472.96390.97790.9604
    ×0.86640.77880.36823.08850.98210.9702
    ×0.85530.71180.33173.42910.98480.9576
    0.89220.78050.22783.31000.99100.9786
    下载: 导出CSV

    表  6  不同模型泛化性能对比

    方法CC↑SIM↑KLDiv↓NSS↑AUC_Judd↑AUC_Borji↑
    URMNet0.58990.61161.07081.83950.91810.8917
    U-Net(2015)[16]0.54020.57331.96221.69750.90220.8750
    AttU-Net(2018)[17]0.49060.55721.95611.46680.88960.8556
    panoramic CNN(2020)[5]0.51460.57701.12451.44860.88650.8504
    QAU-Net(2021)[18]0.50440.59890.53731.55220.88540.8362
    UCTransNet(2021)[19]0.54100.57601.81891.73590.91000.8874
    下载: 导出CSV

    表  7  不同模型复杂度对比

    复杂度指标U-Net[16]QAU-Net[18]UCTransNet[19]URMNet
    GFLOPs(G)4.134539.589356.41131.5567
    Params(M)23.265541.855970.411827.5091
    下载: 导出CSV
  • [1] 刘政怡, 段群涛, 石松, 等. 基于多模态特征融合监督的RGB-D图像显著性检测[J]. 电子与信息学报, 2020, 42(4): 997–1004. doi: 10.11999/JEIT190297

    LIU Zhengyi, DUAN Quntao, SHI Song, et al. RGB-D image saliency detection based on multi-modal feature-fused supervision[J]. Journal of Electronics &Information Technology, 2020, 42(4): 997–1004. doi: 10.11999/JEIT190297
    [2] WEN Anzhou. Real-time panoramic multi-target detection based on mobile machine vision and deep learning[J]. Journal of Physics:Conference Series, 2020, 1650: 032113. doi: 10.1088/1742-6596/1650/3/032113
    [3] ZHANG Ziheng, XU Yanyu, YU Jingyi, et al. Saliency detection in 360° videos[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 504–520.
    [4] COORS B, CONDURACHE A P, and GEIGER A. SphereNet: Learning spherical representations for detection and classification in omnidirectional images[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 525–541.
    [5] MARTÍN D, SERRANO A, and MASIA B. Panoramic convolutions for 360° single-image saliency prediction[C/OL]. The Fourth Workshop on Computer Vision for AR/VR, 2020: 1–4.
    [6] DAI Feng, ZHANG Youqiang, MA Yike, et al. Dilated convolutional neural networks for panoramic image saliency prediction[C]. 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, 2020: 2558–2562.
    [7] MONROY R, LUTZ S, CHALASANI T, et al. SalNet360: Saliency maps for omni-directional images with CNN[J]. Signal Processing:Image Communication, 2018, 69: 26–34. doi: 10.1016/j.image.2018.05.005
    [8] DAHOU Y, TLIBA M, MCGUINNESS K, et al. ATSal: An attention Based Architecture for Saliency Prediction in 360° Videos[M]. Cham: Springer, 2021: 305–320.
    [9] ZHU Dandan, CHEN Yongqing, ZHAO Defang, et al. Saliency prediction on omnidirectional images with attention-aware feature fusion network[J]. Applied Intelligence, 2021, 51(8): 5344–5357. doi: 10.1007/s10489-020-01857-3
    [10] CHAO Fangyi, ZHANG Lu, HAMIDOUCHE W, et al. A multi-FoV viewport-based visual saliency model using adaptive weighting losses for 360° images[J]. IEEE Transactions on Multimedia, 2020, 23: 1811–1826. doi: 10.1109/tmm.2020.3003642
    [11] XU Mai, YANG Li, TAO Xiaoming, et al. Saliency prediction on omnidirectional image with generative adversarial imitation learning[J]. IEEE Transactions on Image Processing, 2021, 30: 2087–2102. doi: 10.1109/tip.2021.3050861
    [12] GUTIÉRREZ J, DAVID E J, COUTROT A, et al. Introducing un salient360! Benchmark: A platform for evaluating visual attention models for 360° contents[C]. The 2018 Tenth International Conference on Quality of Multimedia Experience, Cagliari, Italy, 2018: 1–3.
    [13] MAO Xiaofeng, QI Gege, CHEN Yuefeng, et al. Towards robust vision transformer[J]. arXiv: 2105.07926, 2021.
    [14] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6000–6010.
    [15] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[C/OL]. The 9th International Conference on Learning Representations, 2021.
    [16] RONNEBERGER O, FISCHER P, and BROX T. U-Net: Convolutional networks for biomedical image segmentation[C]. The 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 2015: 234–241.
    [17] OKTAY O, SCHLEMPER J, LE FOLGOC L, et al. Attention u-net: Learning where to look for the pancreas[J]. arXiv: 1804.03999, 2018.
    [18] HONG Luminzi, WANG Risheng, LEI Tao, et al. Qau-Net: Quartet attention U-Net for liver and liver-tumor segmentation[C]. 2021 IEEE International Conference on Multimedia and Expo, Shenzhen, China, 2021: 1–6.
    [19] WANG Haonan, CAO Peng, WANG Jiaqi, et al. UCTransNet: Rethinking the skip connections in U-Net from a channel-wise perspective with transformer[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2022, 36(3): 2441–2249. doi: 10.1609/aaai.v36i3.20144
    [20] CORNIA M, BARALDI L, SERRA G, et al. Predicting human eye fixations via an LSTM-based saliency attentive model[J]. IEEE Transactions on Image Processing, 2018, 27(10): 5142–5154. doi: 10.1109/tip.2018.2851672
    [21] LOU Jianxun, LIN Hanhe, MARSHALL D, et al. TranSalNet: Towards perceptually relevant visual saliency prediction[J]. arXiv: 2110.03593, 2021.
  • 加载中
图(9) / 表(7)
计量
  • 文章访问数:  662
  • HTML全文浏览量:  302
  • PDF下载量:  141
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-05-26
  • 修回日期:  2022-08-18
  • 网络出版日期:  2022-08-23
  • 刊出日期:  2023-06-10

目录

    /

    返回文章
    返回