高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于视图感知的单视图三维重建算法

王年 胡旭阳 朱凡 唐俊

王年, 胡旭阳, 朱凡, 唐俊. 基于视图感知的单视图三维重建算法[J]. 电子与信息学报, 2020, 42(12): 3053-3060. doi: 10.11999/JEIT190986
引用本文: 王年, 胡旭阳, 朱凡, 唐俊. 基于视图感知的单视图三维重建算法[J]. 电子与信息学报, 2020, 42(12): 3053-3060. doi: 10.11999/JEIT190986
Nian WANG, Xuyang HU, Fan ZHU, Jun TANG. Single-view 3D Reconstruction Algorithm Based on View-aware[J]. Journal of Electronics & Information Technology, 2020, 42(12): 3053-3060. doi: 10.11999/JEIT190986
Citation: Nian WANG, Xuyang HU, Fan ZHU, Jun TANG. Single-view 3D Reconstruction Algorithm Based on View-aware[J]. Journal of Electronics & Information Technology, 2020, 42(12): 3053-3060. doi: 10.11999/JEIT190986

基于视图感知的单视图三维重建算法

doi: 10.11999/JEIT190986
基金项目: 国家自然科学基金(61772032)
详细信息
    作者简介:

    王年:男,1966年生,教授,博士,主要从事模式识别与图像处理等方面的研究

    胡旭阳:男,1995年生,硕士生,研究方向为图像生成和3维重建

    朱凡:男,1987年生,博士,主要从事计算机视觉方面的研究

    唐俊:男,1977年生,教授,博士,主要从事模式识别与计算机视觉等方面的研究

    通讯作者:

    唐俊 tangjunahu@163.com

  • 中图分类号: TN911.73; TP301.6

Single-view 3D Reconstruction Algorithm Based on View-aware

Funds: The National Nature Science Foundation of China (61772032)
  • 摘要: 尽管由于丢弃维度将3维(3D)形状投影到2维(2D)视图看似是不可逆的,但是从可视化到计算机辅助几何设计,各个垂直行业对3维重建技术的兴趣正迅速增长。传统基于物体深度图或者RGB图的3维重建算法虽然可以在一些方面达到令人满意的效果,但是它们仍然面临若干问题:(1)粗鲁的学习2D视图与3D形状之间的映射;(2)无法解决物体不同视角下外观差异所带来的的影响;(3)要求物体多个观察视角下的图像。该文提出一个端到端的视图感知3维(VA3D)重建网络解决了上述问题。具体而言,VA3D包含多邻近视图合成子网络和3D重建子网络。多邻近视图合成子网络基于物体源视图生成多个邻近视角图像,且引入自适应融合模块解决了视角转换过程中出现的模糊或扭曲等问题。3D重建子网络使用循环神经网络从合成的多视图序列中恢复物体3D形状。通过在ShapeNet数据集上大量定性和定量的实验表明,VA3D有效提升了基于单视图的3维重建结果。
  • 图  1  视图感知3维重建

    图  2  MSN生成器结构

    图  3  自适应融合

    图  4  定性比较示例样本

    图  5  SA3D与VA3D生成的多视图对比

    图  6  不同合成视图数量的IoU和F-score

    表  1  定量比较结果

    类别IoU F-score
    3D-R2N2_13D-R2N2_5VA3D 3D-R2N2_13D-R2N2_5VA3D
    柜子0.72990.78390.79150.82670.86510.8694
    汽车0.81230.85510.85300.89230.91900.9178
    椅子0.49580.58020.56430.64040.71550.6995
    飞机0.55600.62280.63850.70060.75610.7641
    桌子0.52970.60610.61280.67170.73620.7386
    长凳0.46210.55660.55330.61150.69910.6936
    平均0.59760.66740.66890.72380.78180.7805
    下载: 导出CSV

    表  2  对比SA3D算法结果

    算法平均IoU
    SA3D0.6162
    VA3D0.6741
    下载: 导出CSV

    表  3  MSN中不同输出策略的影响

    模型SSIMPSNRIoUF-score
    仅使用${\left\{ {{{{\tilde{ I}}}_r}} \right\}^{\rm{C}}}$0.803519.80420.65250.7649
    仅使用${\left\{ {{{{\tilde{ I}}}_f}} \right\}^{\rm{C}}}$0.843520.52730.65300.7646
    自适应融合0.848820.62030.65540.7672
    下载: 导出CSV

    表  4  重建结果的方差

    模型$\sigma _{{\rm{IoU}}}^2$$\sigma _{{F} {\rm{ - score}}}^{\rm{2}}$
    合成视图数量=00.00570.0061
    合成视图数量=40.00510.0054
    下载: 导出CSV

    表  5  不同损失函数的组合

    模型SSIMPSNRIoUF-score
    无重建损失${{\cal{L}}_{{\rm{rec}}}}$0.846220.26930.65400.7658
    无对抗损失${{\cal{L}}_{{\rm{adv}}}}$0.851621.43850.65390.7651
    无感知损失${{\cal{L}}_{{\rm{per}}}}$0.841620.31410.65250.7645
    全部损失0.848820.62030.65540.7672
    下载: 导出CSV
  • EIGEN D, PUHRSCH C, and FERGUS R. Depth map prediction from a single image using a multi-scale deep network[C]. The 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 2366–2374.
    WU Jiajun, WANG Yifan, XUE Tianfan, et al. Marrnet: 3D shape reconstruction via 2.5D sketches[C]. The 31st Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 540–550.
    WANG Nanyang, ZHANG Yinda, LI Zhuwen, et al. Pixel2mesh: Generating 3D mesh models from single RGB images[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 55–71. doi: 10.1007/978-3-030-01252-6_4.
    TANG Jiapeng, HAN Xiaoguang, PAN Junyi, et al. A skeleton-bridged deep learning approach for generating meshes of complex topologies from single RGB images[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 4536–4545. doi: 10.1109/cvpr.2019.00467.
    CHOY C B, XU Danfei, GWAK J Y, et al. 3D-R2N2: A unified approach for single and multi-view 3D object reconstruction[C]. The 14th European Conference on Computer Vision, Amsterdam, the Netherlands, 2016: 628–644. doi: 10.1007/978-3-319-46484-8_38.
    HU Xuyang, ZHU Fan, LIU Li, et al. Structure-aware 3D shape synthesis from single-view images[C]. 2018 British Machine Vision Conference, Newcastle, UK, 2018.
    GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]. The 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 2672–2680.
    张惊雷, 厚雅伟. 基于改进循环生成式对抗网络的图像风格迁移[J]. 电子与信息学报, 2020, 42(5): 1216–1222. doi: 10.11999/JEIT190407

    ZHANG Jinglei and HOU Yawei. Image-to-image translation based on improved cycle-consistent generative adversarial network[J]. Journal of Electronics &Information Technology, 2020, 42(5): 1216–1222. doi: 10.11999/JEIT190407
    陈莹, 陈湟康. 基于多模态生成对抗网络和三元组损失的说话人识别[J]. 电子与信息学报, 2020, 42(2): 379–385. doi: 10.11999/JEIT190154

    CHEN Ying and CHEN Huangkang. Speaker recognition based on multimodal generative adversarial nets with triplet-loss[J]. Journal of Electronics &Information Technology, 2020, 42(2): 379–385. doi: 10.11999/JEIT190154
    WANG Tingchun, LIU Mingyu, ZHU Junyan, et al. High-resolution image synthesis and semantic manipulation with conditional gans[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8798–8807. doi: 10.1109/cvpr.2018.00917.
    ULYANOV D, VEDALDI A, and LEMPITSKY V. Instance normalization: The missing ingredient for fast stylization[EB/OL]. https://arxiv.org/abs/1607.08022, 2016.
    XU Bing, WANG Naiyan, CHEN Tianqi, et al. Empirical evaluation of rectified activations in convolutional network[EB/OL]. https://arxiv.org/abs/1505.00853, 2015.
    GOKASLAN A, RAMANUJAN V, RITCHIE D, et al. Improving shape deformation in unsupervised image-to-image translation[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 662–678. doi: 10.1007/978-3-030-01258-8_40.
    MAO Xudong, LI Qing, XIE Haoran, et al. Least squares generative adversarial networks[C]. 2017 IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2813–2821. doi: 10.1109/iccv.2017.304.
    GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of wasserstein GANs[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 5767–5777.
    LEDIG C, THEIS L, HUSZÁR F, et al. Photo-realistic single image super-resolution using a generative adversarial network[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 105–114. doi: 10.1109/CVPR.2017.19.
    SIMONYAN K and ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]. https://arxiv.org/abs/1409.1556, 2014.
    KINGMA D P and BA J. Adam: A method for stochastic optimization[EB/OL]. https://arxiv.org/abs/1412.6980, 2014.
    CHANG A X, FUNKHOUSER T, GUIBAS L, et al. Shapenet: An information-rich 3D model repository[EB/OL]. https://arxiv.org/abs/1512.03012, 2015.
    GRABNER A, ROTH P M, and LEPETIT V. 3D pose estimation and 3D model retrieval for objects in the wild[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 3022–3031. doi: 10.1109/cvpr.2018.00319.
    HE Xinwei, ZHOU Yang, ZHOU Zhichao, et al. Triplet-center loss for multi-view 3D object retrieval[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 1945–1954. doi: 10.1109/cvpr.2018.00208.
  • 加载中
图(6) / 表(5)
计量
  • 文章访问数:  2544
  • HTML全文浏览量:  1345
  • PDF下载量:  134
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-12-09
  • 修回日期:  2020-05-26
  • 网络出版日期:  2020-06-22
  • 刊出日期:  2020-12-08

目录

    /

    返回文章
    返回