高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

单目视角下多尺度可变形对齐感知的双向门控特征聚合立体图像生成

张春兰 屈玉玮 聂浪 林春雨

张春兰, 屈玉玮, 聂浪, 林春雨. 单目视角下多尺度可变形对齐感知的双向门控特征聚合立体图像生成[J]. 电子与信息学报. doi: 10.11999/JEIT250760
引用本文: 张春兰, 屈玉玮, 聂浪, 林春雨. 单目视角下多尺度可变形对齐感知的双向门控特征聚合立体图像生成[J]. 电子与信息学报. doi: 10.11999/JEIT250760
ZHANG Chunlan, QU Yuwei, NIE Lang, LIN Chunyu. Multi-Scale Deformable Alignment-Aware Bidirectional Gated Feature Aggregation for Stereoscopic Image Generation from a Single Image[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250760
Citation: ZHANG Chunlan, QU Yuwei, NIE Lang, LIN Chunyu. Multi-Scale Deformable Alignment-Aware Bidirectional Gated Feature Aggregation for Stereoscopic Image Generation from a Single Image[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250760

单目视角下多尺度可变形对齐感知的双向门控特征聚合立体图像生成

doi: 10.11999/JEIT250760 cstr: 32379.14.JEIT250760
基金项目: 河北省教育厅资助科研项目(BJK2024146),河北省自然科学基金(F2025111005),国家自然科学基金(62502057),衡水学院校级科研项目(2024XJPY04, 2023GC26)
详细信息
    作者简介:

    张春兰:女,博士,讲师,研究方向为计算机视觉、深度学习、人工智能、视角合成、畸变矫正、智能光子学

    屈玉玮:男,博士,讲师,研究方向为计算机视觉、深度学习、人工智能、视角合成、智能光子学

    聂浪:男,博士,讲师,研究方向为为计算机视觉、深度学习、图像视频拼接、图像矫正、深度估计、视角合成

    林春雨:男,博士,教授,研究方向为计算机视觉、深度学习、图像视频处理、人工智能

    通讯作者:

    屈玉玮 quyuwei@hsnc.edu.cn

  • 中图分类号: TN911.73; TP391.4

Multi-Scale Deformable Alignment-Aware Bidirectional Gated Feature Aggregation for Stereoscopic Image Generation from a Single Image

Funds: Science Research Project of Hebei Education Department (BJK2024146), The Natural Science Foundation of Hebei Province (F2025111005), The National Natural Science Foundation of China (62502057), The Scientific Research Project of Hengshui University (2024XJPY04, 2023GC26)
  • 摘要: 单目视角下的立体图像生成通常依赖深度真值作为先验,易存在几何错位、遮挡伪影及纹理模糊等问题。为此,本文提出一种多尺度可变形对齐感知的双向门控特征聚合立体图像生成网络,在无需深度真值监督的条件下实现端到端训练。该方法引入多尺度可变形对齐模块(MSDA)根据内容自适应调整采样位置,在不同尺度上自适应对齐源特征与目标特征,缓解固定卷积难以适应几何变形和视差变化引起的错位问题;此外,构建纹理结构双向门控特征聚合模块(Bi-GFA),提出一种约束网络浅层学习纹理、深层建模结构的特征解耦策略,实现纹理与结构信息的动态互补与高效融合;同时,设计可学习的对齐引导损失(LAG),进一步提升特征对齐精度与语义一致性。在KITTI、MPEG-FTV及多视点深度视频序列数据集上的实验结果表明,所提方法在结构还原性、纹理质量及视角一致性等方面优于现有先进方法。
  • 图  1  整体网络架构图

    图  2  MSDA模块

    图  3  双向门控特征聚合流程图

    图  4  KITTI数据集下对比实验可视化结果

    图  5  MPEG-FTV数据集对比实验可视化结果

    表  1  在KTTTI数据集下对比实验定量结果

    测试分辨率对比方法平面数NPSNRSSIMLPIPS
    384×128Tulsiani等人[13]NA16.5000.572NA
    Xie等人[7]NA17.3520.5980.205
    Tucker等人[11]3219.5000.733NA
    Zhang等人[9]3221.5580.7600.127
    Li等人[12]25621.9000.8280.112
    Zhang等人[6]NA22.0400.7490.105
    Szymanowicz等人[22]NA21.9600.8260.132
    Fang等人[23]NA22.2620.8370.116
    本文方法NA22.4330.8020.089
    下载: 导出CSV

    表  2  在MPEG-FTV及多视点深度视频序列户内数据集设置下对比实验结果

    对比方法NMPEG-FTV-DogMPEG-FTV-ChampagneAkko & Kayo
    PSNRSSIMLPIPSPSNRSSIMLPIPSPSNRSSIMLPIPS
    Xie等人[7]NA16.5810.2230.31818.6780.7120.27719.8520.6620.119
    Tucker等人[11]3219.4690.7720.08619.3460.8840.10221.2560.6420.090
    Zhang等人[6]NA20.7520.7930.04419.7960.8860.07725.6960.8810.082
    本文方法NA21.8810.8860.02722.8380.8960.05926.8270.9020.073
    对比方法NRenaMPEG-FTV-KendoMPEG-FTV-Balloon
    PSNRSSIMLPIPSPSNRSSIMLPIPSPSNRSSIMLPIPS
    Xie等人[7]NA27.1090.8750.09922.0080.8030.17721.1460.7050.213
    Tucker等人[11]3219.1470.3800.22123.7830.8820.04224.5390.7940.066
    Zhang等人[6]NA30.4260.9560.09222.8690.9100.06323.6850.8710.095
    本文方法NA31.2590.9610.08025.8280.9520.01925.3710.9410.018
    下载: 导出CSV

    表  3  在KTTTI户外数据集以及MPEG-FTV-Dog户内数据集下的消融实验结果

    组件MSDABi-GFA$ {\mathcal{L}}_{LAG} $KITTIMPEG-FTV-Dog
    PSNRSSIMLPIPSPSNRSSIMLPIPS
    w/o ALL15.5930.6510.20715.1110.6030.198
    w/o MSDA20.8940.7880.13919.5640.7350.156
    w/o Bi-GFA21.6980.8090.12320.5550.7910.132
    w/o $ {\mathcal{L}}_{LAG} $22.8380.8570.11821.0020.8140.029
    本文方法22.9680.8630.10121.8810.8860.027
    下载: 导出CSV
  • [1] SHEN Qiuhong, WU Zike, YI Xuanyu, et al. Gamba: Marry Gaussian splatting with mamba for single-view 3D reconstruction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025: 1–14. doi: 10.1109/TPAMI.2025.3569596. (查阅网上资料,未找到本条文献卷期号信息,请确认并补充).
    [2] JIANG Lei, SCHAEFER G, and MENG Qinggang. Multi-scale feature fusion for single image novel view synthesis[J]. Neurocomputing, 2024, 599: 128081. doi: 10.1016/j.neucom.2024.128081.
    [3] WILES O, GKIOXARI G, SZELISKI R, et al. SynSin: End-to-end view synthesis from a single image[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 7465–7475. doi: 10.1109/CVPR42600.2020.00749.
    [4] 周洋, 蔡毛毛, 黄晓峰, 等. 结合上下文特征融合的虚拟视点图像空洞填充[J]. 电子与信息学报, 2024, 46(4): 1479–1487. doi: 10.11999/JEIT230181.

    ZHOU Yang, CAI Maomao, HUANG Xiaofeng, et al. Hole filling for virtual view synthesized image by combining with contextual feature fusion[J]. Journal of Electronics & Information Technology, 2024, 46(4): 1479–1487. doi: 10.11999/JEIT230181.
    [5] ZHENG Chuanxia, CHAM T J, and CAI Jianfei. Pluralistic image completion[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 1438–1447. doi: 10.1109/CVPR.2019.00153.
    [6] ZHANG Chunlan, LIN Chunyu, LIAO Kang, et al. As-deformable-as-possible single-image-based view synthesis without depth prior[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(8): 3989–4001. doi: 10.1109/TCSVT.2023.3237815.
    [7] XIE Junyuan, GIRSHICK R, and FARHADI A. Deep3D: Fully automatic 2D-to-3D video conversion with deep convolutional neural networks[C]. 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 842–857. doi: 10.1007/978-3-319-46493-0_51.
    [8] ZHOU Tinghui, TULSIANI S, SUN Weilun, et al. View synthesis by appearance flow[C]. 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 2016: 286–301. doi: 10.1007/978-3-319-46493-0_18.
    [9] ZHANG Chunlan, LIN Chunyu, LIAO Kang, et al. SivsFormer: Parallax-aware transformers for single-image-based view synthesis[C]. 2022 IEEE Conference on Virtual Reality and 3D User Interfaces, Christchurch, New Zealand, 2022: 47–56. doi: 10.1109/VR51125.2022.00022.
    [10] FLYNN J, NEULANDER I, PHILBIN J, et al. Deep stereo: Learning to predict new views from the world's imagery[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 5515–5524. doi: 10.1109/CVPR.2016.595.
    [11] TUCKER R and SNAVELY N. Single-view view synthesis with multiplane images[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 548–557. doi: 10.1109/CVPR42600.2020.00063.
    [12] LI Jiaxin, FENG Zijian, SHE Qi, et al. MINE: Towards continuous depth MPI with NeRF for novel view synthesis[C]. Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, Canada, 2021: 12558–12568. doi: 10.1109/ICCV48922.2021.01235.
    [13] TULSIANI S, TUCKER R, and SNAVELY N. Layer-structured 3D scene inference via view synthesis[C]. Proceedings of the 15th European Conference on Computer Vision, Munich, Germany, 2018: 311–327. doi: 10.1007/978-3-030-01234-2_19.
    [14] MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: Representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM, 2022, 65(1): 99–106. doi: 10.1145/3503250.
    [15] 孙文博, 高智, 张依晨, 等. 稀疏视角下基于几何一致性的神经辐射场卫星城市场景渲染与数字表面模型生成[J]. 电子与信息学报, 2025, 47(6): 1679–1689. doi: 10.11999/JEIT240898.

    SUN Wenbo, GAO Zhi, ZHANG Yichen, et al. Geometrically consistent based neural radiance field for satellite city scene rendering and digital surface model generation in sparse viewpoints[J]. Journal of Electronics & Information Technology, 2025, 47(6): 1679–1689. doi: 10.11999/JEIT240898.
    [16] LUO Dengyan, XIANG Yanping, WANG Hu, et al. Deformable Feature Alignment and Refinement for moving infrared small target detection[J]. Pattern Recognition, 2026, 169: 111894. doi: 10.1016/j.patcog.2025.111894.
    [17] 姚婷婷, 肇恒鑫, 冯子豪, 等. 上下文感知多感受野融合网络的定向遥感目标检测[J]. 电子与信息学报, 2025, 47(1): 233–243. doi: 10.11999/JEIT240560.

    YAO Tingting, ZHAO Hengxin, FENG Zihao, et al. A context-aware multiple receptive field fusion network for oriented object detection in remote sensing images[J]. Journal of Electronics & Information Technology, 2025, 47(1): 233–243. doi: 10.11999/JEIT240560.
    [18] FIAZ M, NOMAN M, CHOLAKKAL H, et al. Guided-attention and gated-aggregation network for medical image segmentation[J]. Pattern Recognition, 2024, 156: 110812. doi: 10.1016/j.patcog.2024.110812.
    [19] GUO Shuai, HU Jingchuan, ZHOU Kai, et al. Real-time free viewpoint video synthesis system based on DIBR and a depth estimation network[J]. IEEE Transactions on Multimedia, 2024, 26: 6701–6716. doi: 10.1109/TMM.2024.3355639.
    [20] NIKLAUS S, MAI Long, YANG Jimei, et al. 3D Ken Burns effect from a single image[J]. ACM Transactions on Graphics, 2019, 38(6): 184. doi: 10.1145/3355089.3356528.
    [21] YU Jiahui, LIN Zhe, YANG Jimei, et al. Generative image inpainting with contextual attention[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 5505–5514. doi: 10.1109/CVPR.2018.00577.
    [22] SZYMANOWICZ S, INSAFUTDINOV E, ZHENG Chuanxia, et al. Flash3D: Feed-forward generalisable 3D scene reconstruction from a single image[C]. 2025 International Conference on 3D Vision, Singapore, Singapore, 2025: 670–681. doi: 10.1109/3DV66043.2025.00067.
    [23] FANG Kun, ZHANG Qinghui, WAN Chenxia, et al. Single view generalizable 3D reconstruction based on 3D Gaussian splatting[J]. Scientific Reports, 2025, 15(1): 18468. doi: 10.1038/s41598-025-03200-7.
  • 加载中
图(5) / 表(3)
计量
  • 文章访问数:  8
  • HTML全文浏览量:  0
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 修回日期:  2026-01-26
  • 录用日期:  2026-01-26
  • 网络出版日期:  2026-02-12

目录

    /

    返回文章
    返回