高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于密集特征融合的无监督单目深度估计

陈莹 王一良

陈莹, 王一良. 基于密集特征融合的无监督单目深度估计[J]. 电子与信息学报, 2021, 43(10): 2976-2984. doi: 10.11999/JEIT200590
引用本文: 陈莹, 王一良. 基于密集特征融合的无监督单目深度估计[J]. 电子与信息学报, 2021, 43(10): 2976-2984. doi: 10.11999/JEIT200590
Ying CHEN, Yiliang WANG. Unsupervised Monocular Depth Estimation Based on Dense Feature Fusion[J]. Journal of Electronics & Information Technology, 2021, 43(10): 2976-2984. doi: 10.11999/JEIT200590
Citation: Ying CHEN, Yiliang WANG. Unsupervised Monocular Depth Estimation Based on Dense Feature Fusion[J]. Journal of Electronics & Information Technology, 2021, 43(10): 2976-2984. doi: 10.11999/JEIT200590

基于密集特征融合的无监督单目深度估计

doi: 10.11999/JEIT200590
基金项目: 国家自然科学基金(61573168)
详细信息
    作者简介:

    陈莹:女,1976年生,教授,博士,研究方向为信息融合、模式识别

    王一良:男,1997年生,硕士生,研究方向为计算机视觉与模式识别

    通讯作者:

    陈莹 chenying@jiangnan.edu.cn

  • 中图分类号: TN911.73; TP391

Unsupervised Monocular Depth Estimation Based on Dense Feature Fusion

Funds: The National Natural Science Foundation of China (61573168)
  • 摘要: 针对无监督单目深度估计生成深度图质量低、边界模糊、伪影过多等问题,该文提出基于密集特征融合的深度网络编解码结构。设计密集特征融合层(DFFL)并将其以密集连接的形式填充U型编解码器,同时精简编码器部分,实现编、解码器的性能均衡。在训练过程中,将校正后的双目图像输入给网络,以重构视图的相似性约束网络生成视差图。测试时,根据已知的相机基线距离与焦距将生成的视差图转换为深度图。在KITTI数据集上的实验结果表明,该方法在预测精度和误差值上优于现有的算法。
  • 图  1  无监督深度估计算法框图

    图  2  U-Net, U-Net++和本文的网络拓扑图

    图  3  密集特征融合层及其密集连接

    图  4  网络框架

    图  5  KITTI数据集上可视化结果对比

    表  1  修改前后的编码器参数

    R50PR50
    block17×7,64,stride2$\left[ {\begin{array}{*{20}{c}}{1 \times 1,8}\\{3 \times 3,8}\\{1 \times 1,32}\end{array}} \right] \times 2$
    block2_x3×3 max pool,stride2, $\left[ {\begin{array}{*{20}{c}}{1 \times 1,64}\\{3 \times 3,64}\\{1 \times 1,256}\end{array}} \right]$×3$\left[ {\begin{array}{*{20}{c}}{1 \times 1,32}\\{3 \times 3,32}\\{1 \times 1,128}\end{array}} \right]$×3
    block3_x$\left[ {\begin{array}{*{20}{c}}{1 \times 1,128}\\{3 \times 3,128}\\{1 \times 1,512}\end{array}} \right]$×4$\left[ {\begin{array}{*{20}{c}}{1 \times 1,64}\\{3 \times 3,64}\\{1 \times 1,256}\end{array}} \right]$×4
    block4_x$\left[ {\begin{array}{*{20}{c}}{1 \times 1,256}\\{3 \times 3,256}\\{1 \times 1,1024}\end{array}} \right]$×6$\left[ {\begin{array}{*{20}{c}}{1 \times 1,128}\\{3 \times 3,128}\\{1 \times 1,512}\end{array}} \right]$×6
    block5_x$\left[ {\begin{array}{*{20}{c}}{1 \times 1,512}\\{3 \times 3,512}\\{1 \times 1,2048}\end{array}} \right]$×3$\left[ {\begin{array}{*{20}{c}}{1 \times 1,256}\\{3 \times 3,256}\\{1 \times 1,1024}\end{array}} \right]$×3
    下载: 导出CSV

    表  2  KITTI数据集使用Eigen拆分集的验证结果

    方法监督方式越小越好越大越好
    Abs RelSq RelRMSERMSE ln$\delta < 1.25$$\delta < {1.25^2}$$\delta < {1.25^3}$
    Eigen[3]D0.2031.5486.3070.2820.7020.8900.890
    Liu[4]D0.2011.5846.4710.2730.6800.8980.967
    Klodt[19]D+M0.1661.4905.9980.7780.9190.966
    Zhou[9]M0.1831.5956.7090.2700.7340.9020.959
    Struct2depth[20]M0.1411.0265.2910.2150.8160.9450.979
    Garg[10]S0.1521.2265.8490.2460.7840.9210.967
    StrAT[21]S0.1281.0195.4030.2270.8270.9350.971
    Monodepth2[22]S0.1301.1445.4850.2320.8310.9320.968
    Monodepth+pp[11]S0.1281.0385.3550.2230.8330.9390.972
    3Net+pp[23]S0.1260.9615.2050.2200.8350.9410.974
    本文S0.1311.1105.4260.2240.8390.9410.972
    本文+ppS0.1220.9395.0630.2120.8500.9470.976
    下载: 导出CSV

    表  3  KITTI数据集消融实验的结果

    方法编码器网络参数量(×106)Abs Rel$\delta < 1.25$预测速度(fps)
    BaselineR5058.50.1430.81221
    Baseline+DFFLR50158.00.1350.83311
    网络修剪+DFFLPR5039.40.1310.83921
    BaselineR1820.20.1490.79446
    Baseline+DFFLR1820.40.1390.82022
    网络修剪+DFFLPR1819.10.1290.83533
    下载: 导出CSV

    表  4  3种输入下消融实验的结果

    方法编码器网络Abs Rel$\delta < 1.25$
    BaselinePR500.1370.821
    Baseline +DFFL (第1种输入)PR500.1610.828
    Baseline +DFFL (第2种输入)PR500.1320.836
    Baseline +DFFL (第3种输入)PR500.1310.839
    下载: 导出CSV
  • [1] SNAVELY N, SEITZ S M, and SZELISKI R. Skeletal graphs for efficient structure from motion[C]. 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, USA, 2008: 1–8.
    [2] 狄红卫, 柴颖, 李逵. 一种快速双目视觉立体匹配算法[J]. 光学学报, 2009, 29(8): 2180–2184. doi: 10.3788/AOS20092908.2180

    DI Hongwei, CHAI Ying, and LI Kui. A fast binocular vision stereo matching algorithm[J]. Acta Optica Sinica, 2009, 29(8): 2180–2184. doi: 10.3788/AOS20092908.2180
    [3] EIGEN D, PUHRSCH C, and FERGUS R. Depth map prediction from a single image using a multi-scale deep network[C]. The 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 2366–2374.
    [4] LIU Fayao, SHEN Chunhua, LIN Guosheng, et al. Learning depth from single monocular images using deep convolutional neural fields[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(10): 2024–2039. doi: 10.1109/TPAMI.2015.2505283
    [5] LAINA I, RUPPRECHT C, BELAGIANNIS V, et al. Deeper depth prediction with fully convolutional residual networks[C]. The 2016 4th International Conference on 3D Vision, Stanford, USA, 2016: 239–248.
    [6] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778.
    [7] 周武杰, 潘婷, 顾鹏笠, 等. 基于金字塔池化网络的道路场景深度估计方法[J]. 电子与信息学报, 2019, 41(10): 2509–2515. doi: 10.11999/JEIT180957

    ZHOU Wujie, PAN Ting, GU Pengli, et al. Depth estimation of monocular road images based on pyramid scene analysis network[J]. Journal of Electronics &Information Technology, 2019, 41(10): 2509–2515. doi: 10.11999/JEIT180957
    [8] ZHAO Shanshan, FU Huan, GONG Mingming, et al. Geometry-aware symmetric domain adaptation for monocular depth estimation[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 9780–9790.
    [9] ZHOU Tinghui, BROWN M, SNAVELY N, et al. Unsupervised learning of depth and ego-motion from video[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6612–6619.
    [10] GARG R, B G V K, CARNEIRO G, et al. Unsupervised CNN for single view depth estimation: Geometry to the rescue[C]. The 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 740–756.
    [11] GODARD C, MAC AODHA O, and BROSTOW G J. Unsupervised monocular depth estimation with left-right consistency[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 6602–6611.
    [12] ZHOU Zongwei, SIDDIQUEE M R, TAJBAKHSH N, et al. UNet++: Redesigning skip connections to exploit multiscale features in image segmentation[J]. IEEE Transactions on Medical Imaging, 2020, 39(6): 1856–1867. doi: 10.1109/TMI.2019.2959609
    [13] GEIGER A, LENZ P, and URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]. 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, USA, 2012: 3354–3361.
    [14] HARTLEY R and ZISSERMAN A. Multiple View Geometry in Computer Vision[M]. 2nd ed. New York: Cambridge University Press, 2003: 262–263.
    [15] RONNEBERGER O, FISCHER P, and BROX T. U-net: Convolutional networks for biomedical image segmentation[C]. The 18th International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 2015: 234–241.
    [16] HUANG Gao, LIU Zhuang, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 2261–2269.
    [17] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 2818–2826.
    [18] WANG Zhou, BOVIK A C, SHEIKH H R, et al. Image quality assessment: From error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13(4): 600–612. doi: 10.1109/TIP.2003.819861
    [19] KLODT M and VEDALDI A. Supervising the new with the old: Learning SFM from SFM[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 713–728.
    [20] CASSER V, PIRK S, MAHJOURIAN R, et al. Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos[C]. The 33rd AAAI Conference on Artificial Intelligence, Honolulu, USA, 2019: 8001–8008.
    [21] MEHTA I, SAKURIKAR P, and NARAYANAN P J. Structured adversarial training for unsupervised monocular depth estimation[C]. 2018 International Conference on 3D Vision, Verona, Italy, 2018: 314–323.
    [22] GODARD C, MAC AODHA O, FIRMAN M, et al. Digging into self-supervised monocular depth estimation[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 3827–3837.
    [23] POGGI M, TOSI F, and MATTOCCIA S. Learning monocular depth estimation with unsupervised trinocular assumptions[C]. 2018 International Conference on 3D Vision, Verona, Italy, 2018: 324–333.
  • 加载中
图(5) / 表(4)
计量
  • 文章访问数:  1444
  • HTML全文浏览量:  465
  • PDF下载量:  121
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-07-17
  • 修回日期:  2020-12-29
  • 网络出版日期:  2021-02-03
  • 刊出日期:  2021-10-18

目录

    /

    返回文章
    返回