结合上下文特征融合的虚拟视点图像空洞填充

周洋; 蔡毛毛; 黄晓峰; 殷海兵

doi:10.11999/JEIT230181

结合上下文特征融合的虚拟视点图像空洞填充

doi: 10.11999/JEIT230181

杭州电子科技大学通信工程学院杭州 310018

基金项目: 浙江省自然科学基金(LY21F020021)，国家自然科学基金(61972123, 61901150)

详细信息

作者简介:
周洋：男，副教授，研究方向为沉浸式视频编码

蔡毛毛：男，硕士生，研究方向为虚拟视点合成技术

黄晓峰：男，副教授，研究方向为视频编码与深度学习

殷海兵：男，教授，研究方向为智能视频图像压缩与处理

通讯作者:
周洋　zhouyang@hdu.edu.cn

中图分类号: TN911.73; TP391.41
计量
- 文章访问数: 71
- HTML全文浏览量: 39
- PDF下载量: 18
- 被引次数: 0
出版历程
- 收稿日期: 2023-08-06
- 修回日期: 2023-12-21
- 网络出版日期: 2024-01-26
- 刊出日期: 2024-04-24

Hole Filling for Virtual View Synthesized Image by Combining with Contextual Feature Fusion

School of Communication Engineering, HangZhou DianZi University, Hangzhou 310018, China

Funds: The Natural Science Foundation of Zhejiang Province (LY21F020021), The National Natural Science Foundation of China (61972123, 61901150)

摘要

摘要: 由于参考纹理视图的前景遮挡和不同视点间的视角差异，基于深度图的虚拟视点合成会产生大量空洞，先前的空洞填充方法耗时较长且填充区域与合成图像缺乏纹理一致性。该文首先对深度图进行预处理来减少空洞填充时的前景渗透；然后，针对经3D-warping后输出合成图像中的空洞，设计了一种基于生成对抗网络(GAN)架构的图像生成网络来填充空洞。该网络模型由2级子网络构成，第1级网络生成空洞区域的纹理结构信息，第2级网络采用了一种结合上下文特征融合的注意力模块来提升空洞填充质量。提出的网络模型能有效解决当虚拟视点图像中的前景对象存在快速运动时，空洞填充区易产生伪影的问题。在多视点深度序列上的实验结果表明，提出方法在主客观质量上均优于已有的虚拟视点图像空洞填充方法。
- 虚拟视点绘制 /
- 空洞填充 /
- 特征融合 /
- 上下文特征
Abstract: Due to the foreground occlusion of the reference texture and the difference in angle-of-views, many holes can be found in the synthesized images produced by depth image-based virtual view rendering. Prior disocclusion methods are time-consuming and need more texture consistency between hole-filled regions and the synthesized image. In this paper, depth maps are first pre-processed to reduce foreground penetration during hole filling. Then, for holes in the synthesized images after 3D warping, an image generation network based on the architecture of a Generative Adversarial Network (GAN) is designed to fill the holes. This network consists of two sub-networks. The first network generates the texture and structure information of hole regions, while the second network adopts an attention module combining contextual feature fusion to improve the quality of the hole-filled regions. The proposed network can effectively solve the problem of the hole-filling areas being prone to producing artifacts when fast motion exist in the foreground objects. Experimental results on multi-view video plus depth sequences show that the proposed method is superior to the existing methods in both subjective and objective quality.
- Virtual view rendering /
- Hole filling /
- Feature fusion /
- Contextual features

HTML全文

图 1 虚拟视点绘制流程框图

下载: 全尺寸图片幻灯片

图 2 深度图前景轮廓膨胀前后的绘制图像对比图

下载: 全尺寸图片幻灯片

图 3 CFF-Net组成框图

下载: 全尺寸图片幻灯片

图 4 掩膜图制作

下载: 全尺寸图片幻灯片

图 5 上下文特征融合注意力模型示意图

下载: 全尺寸图片幻灯片

图 6 不同方法的逐帧性能对比结果

下载: 全尺寸图片幻灯片

图 7 不同方法主观质量比较

下载: 全尺寸图片幻灯片

图 8 不同注意力模块下局部图比较

下载: 全尺寸图片幻灯片

表 1 数据集详情

名称	分辨率	总帧数	基线距离(mm)	修复帧数	训练集：测试集
Ballet	1024×768	800	380	100	9:1
Breakdance	1024×768	800	370	100	9:1

下载: 导出CSV

表 2 不同方法在PSRN上的平均值

测试序列	PSNR↑
测试序列	BA41	BA43	BA52	BA54	BR56	BR57
Cri^[2]	22.057	28.019	23.639	30.471	29.258	27.779
Anh^[6]	23.357	27.522	24.681	30.820	29.618	27.797
Luo^[8]	24.077	27.996	24.661	32.104	29.552	27.955
VSRS^[23]	23.039	27.615	24.361	29.089	29.432	28.177
Df2^[16]	22.923	29.214	24.421	32.524	29.593	28.029
Gla^[22]	23.674	28.823	25.015	32.284	29.599	28.279
本文	23.576	29.472	25.111	33.227	29.802	28.325

下载: 导出CSV

表 3 不同方法在SSIM上的平均值

测试序列	SSIM↑
测试序列	BA41	BA43	BA52	BA54	BR56	BR57
Cri^[2]	0.757	0.873	0.764	0.878	0.806	0.785
Anh^[6]	0.770	0.863	0.761	0.873	0.791	0.791
Luo^[8]	0.792	0.871	0.773	0.881	0.790	0.781
VSRS^[23]	0.782	0.867	0.787	0.879	0.810	0.791
Df2^[16]	0.768	0.873	0.774	0.887	0.806	0.791
Gla^[22]	0.759	0.872	0.766	0.883	0.806	0.792
本文	0.767	0.875	0.787	0.889	0.812	0.798

下载: 导出CSV

表 4 不同注意力模块的性能比较

	Df2	Gla	E2F	本文
PSNR	33.105	32.968	31.452	33.227
SSIM	0.887	0.887	0.885	0.889

下载: 导出CSV

参考文献(24)

[1]	DE OLIVEIRA A Q, DA SILVEIRA T L T, WALTER M, et al. A hierarchical superpixel based approach for DIBR view synthesis[J]. IEEE Transactions on Image Processing, 2021, 30: 6408–6419. doi: 10.1109/TIP.2021.3092817.
[2]	CRIMINISI A, PEREZ P, and TOYAMA K. Region filling and object removal by exemplar-based image inpainting[J]. IEEE Transactions on Image Processing, 2004, 13(9): 1200–1212. doi: 10.1109/TIP.2004.833105.
[3]	ZHU Ce and LI Shuai. Depth image based view synthesis: New insights and perspectives on Hole generation and filling[J]. IEEE Transactions on Broadcasting, 2016, 62(1): 82–93. doi: 10.1109/TBC.2015.2475697.
[4]	CHANG Yuan, CHEN Yisong, and WANG Guoping. Range guided depth refinement and uncertainty-aware aggregation for view synthesis[C]. International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, 2021: 2290-2294. doi: 10.1109/ICASSP39728.2021.9413981.
[5]	CHENG Cong, LIU Ju, YUAN Hui, et al. A DIBR method based on inverse mapping and depth-aided image inpainting[C]. 2013 IEEE China Summit and International Conference on Signal and Information Processing, Beijing, China, 2013: 518-522. doi: 10.1109/ChinaSIP.2013.6625394.
[6]	ANH I and KIM C. A novel depth-based virtual view synthesis method for free viewpoint video[J]. IEEE Transactions on Broadcasting, 2013, 59(4): 614–626. doi: 10.1109/TBC.2013.2281658.
[7]	RAHAMAN D M M and PAUL M. Virtual view synthesis for free viewpoint video and Multiview video compression using Gaussian mixture modelling[J]. IEEE Transactions on Image Processing, 2018, 27(3): 1190–1201. doi: 10.1109/TIP.2017.2772858.
[8]	LUO Guibo, ZHU Yuesheng, WENG Zhenyu, et al. A disocclusion inpainting framework for depth-based view synthesis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(6): 1289–1302. doi: 10.1109/TPAMI.2019.2899837.
[9]	GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]. Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 2672–2680. doi: 10.5555/2969033.2969125.
[10]	RADFORD A, METZ L, and CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[C]. 4th International Conference on Learning Representations, Puerto Rico, 2016: 1–16.
[11]	IIZUKA S, SIMO-SERRA E, and ISHIKAWA H. Globally and locally consistent image completion[J]. ACM Transactions on Graphics, 2017, 36(4): 107. doi: 10.1145/3072959.3073659.
[12]	LI Jingyuan, WANG Ning, ZHANG Lefei, et al. Recurrent feature reasoning for image inpainting[C]. Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 7757–7765. doi: 10.1109/CVPR42600.2020.00778.
[13]	XU Shunxin, LIU Dong, and XIONG Zhiwei. E2I: Generative inpainting from edge to image[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 31(4): 1308–1322. doi: 10.1109/TCSVT.2020.3001267.
[14]	SHIN Y G, SAGONG M C, YEO Y J, et al. PEPSI++: Fast and lightweight network for image inpainting[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(1): 252–265. doi: 10.1109/TNNLS.2020.2978501.
[15]	孙磊, 杨宇, 毛秀青, 等. 基于空间特征的生成对抗网络数据生成方法[J]. 电子与信息学报, 2023, 45(6): 1959–1969. doi: 10.11999/JEIT211285. SUN Lei, YANG Yu, MAO Xiuqing, et al. Data Generation based on generative adversarial network with spatial features[J]. Journal of Electronics & Information Technology, 2023, 45(6): 1959–1969. doi: 10.11999/JEIT211285.
[16]	YU Jiahui, LIN Zhe, YANG Jimei, et al. Free-form image inpainting with gated convolution[C]. 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 4470–4479. doi: 10.1109/ICCV.2019.00457.
[17]	PATHAK D, KRÄHENBÜHL P, DONAHUE J, et al. Context encoders: Feature learning by inpainting[C]. IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 2536–2544. doi: 10.1109/CVPR.
[18]	LIU Guilin, DUNDAR A, SHIH K J, et al. Partial convolution for padding, inpainting, and image synthesis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(5): 6096–6110. doi: 10.1109/TPAMI.2022.3209702.
[19]	Microsoft. MSR 3D Video dataset from official microsoft download center[EB/OL]. https://www.microsoft.com/en-us/download/details.aspx?id=52358, 2014.
[20]	JOHNSON J, ALAHI A, and LI Feifei. Perceptual losses for real-time style transfer and super-resolution[C]. 14th European Conference Computer Vision 2016, Amsterdam, Netherlands, 2016: 694–711. doi: 10.1007/978-3-319-46475-6_43.
[21]	YUAN H L and VELTKAMP R C. Free-viewpoint image based rendering with multi-layered depth maps[J]. Optics and Lasers in Engineering, 2021, 147: 106726. doi: 10.1016/j.optlaseng.2021.106726.
[22]	UDDIN S M N and JUNG Y J. Global and local attention-based free-form image inpainting[J]. Sensors, 2020, 20(11): 3204. doi: 10.3390/s20113204.
[23]	STANKIEWICZ O and WEGNER K. Depth estimation reference software and view synthesis reference software[S]. Switzerland: ISO/IEC JTC1/SC29/WG11 MPEG/M16027, 2009.
[24]	LI Zhen, LU Chengze, QIN Jianhua, et al. Towards an end-to-end framework for flow-guided video inpainting[C]. Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 17541–17550. doi: 10.1109/CVPR52688.2022.01704.

施引文献

资源附件(0)

访问统计

图(8) / 表(4)

计量

文章访问数: 71
HTML全文浏览量: 39
PDF下载量: 18
被引次数: 0

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

结合上下文特征融合的虚拟视点图像空洞填充

doi: 10.11999/JEIT230181

作者简介:
周洋：男，副教授，研究方向为沉浸式视频编码

蔡毛毛：男，硕士生，研究方向为虚拟视点合成技术

黄晓峰：男，副教授，研究方向为视频编码与深度学习

殷海兵：男，教授，研究方向为智能视频图像压缩与处理

通讯作者:
周洋　zhouyang@hdu.edu.cn

计量

Hole Filling for Virtual View Synthesized Image by Combining with Contextual Feature Fusion

计量

目录

留言板

结合上下文特征融合的虚拟视点图像空洞填充

doi: 10.11999/JEIT230181

作者简介: 周洋：男，副教授，研究方向为沉浸式视频编码 蔡毛毛：男，硕士生，研究方向为虚拟视点合成技术 黄晓峰：男，副教授，研究方向为视频编码与深度学习 殷海兵：男，教授，研究方向为智能视频图像压缩与处理

通讯作者: 周洋 zhouyang@hdu.edu.cn

计量

出版历程

Hole Filling for Virtual View Synthesized Image by Combining with Contextual Feature Fusion

计量

出版历程

目录

作者简介:
周洋：男，副教授，研究方向为沉浸式视频编码

蔡毛毛：男，硕士生，研究方向为虚拟视点合成技术

黄晓峰：男，副教授，研究方向为视频编码与深度学习

殷海兵：男，教授，研究方向为智能视频图像压缩与处理

通讯作者:
周洋　zhouyang@hdu.edu.cn