低分辨率随机遮挡人脸图像的超分辨率修复

任坤; 李峥瑱; 桂源泽; 范春奇; 栾衡

doi:10.11999/JEIT231262

低分辨率随机遮挡人脸图像的超分辨率修复

doi: 10.11999/JEIT231262 cstr: 32379.14.JEIT231262

任坤^{1, 2, 3, ,},
李峥瑱^{1, 2, 3},
桂源泽¹,
范春奇¹,
栾衡⁴

1.
北京工业大学信息学部人工智能与自动学院北京市 100124
2.
北京工业大学数字社区教育部工程研究中心北京市 100124
3.
北京工业大学城市轨道交通北京实验室北京市 100124
4.
中国民航信息网络股份有限公司北京市 101300

基金项目: 国家重点研发计划(2023YFC3904605)

详细信息

作者简介:
任坤：女，博士，副教授，研究方向为计算机视觉，计算摄像学

李峥瑱：男，硕士生，研究方向为深度学习，计算机视觉

桂源泽：男，硕士生，研究方向为计算机视觉，SLAM

范春奇：男，硕士，研究方向为深度学习，计算机视觉

栾衡：男，硕士，高级工程师，研究方向为高效能软件架构及算法开发

通讯作者:
任坤　renkun@bjut.edu.cn

中图分类号: TN911.73
计量
- 文章访问数: 275
- HTML全文浏览量: 206
- PDF下载量: 50
- 被引次数: 0
出版历程
- 收稿日期: 2023-11-15
- 修回日期: 2024-02-05
- 网络出版日期: 2024-03-04
- 刊出日期: 2024-08-30

Super-Resolution Inpainting of Low-resolution Randomly Occluded Face Images

REN Kun^{1, 2, 3
, ,},
LI Zhengzhen^{1, 2, 3},
GUI Yuanze¹,
FAN Chunqi¹,
LUAN Heng⁴

1.
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
2.
Engineering Research Center of Digital Community, Ministry of Education, Beijing University of Technology, Beijing 100124, China
3.
Beijing Laboratory for Urban Mass Transit, Beijing University of Technology, Beijing 100124, China
4.
TravelSky Technology Limited, Beijing 101300, China

Funds: The National Key Research and Development Project (2023YFC3904605)

摘要

摘要: 针对低分辨率随机遮挡人脸图像，该文提出一种端到端的4倍超分辨率修复生成对抗网络(SRIGAN)。SRIGAN生成网络由编码器、特征补偿子网络和含有金字塔注意力模块的解码器构成；判别网络为改进的Patch判别网络。该网络通过特征补偿子网络和两阶段训练策略有效学习遮挡区域的缺失特征，通过在解码器中引入金字塔注意力模块和多尺度重建损失增强信息重构，从而实现低分辨率随机遮挡图像与4倍高分辨率完整图像的映射。同时，通过损失函数设计和改进Patch判别网络，确保网络训练的稳定性，提升生成网络性能。对比实验和模块验证实验验证了该算法的有效性。
- 图像修复 /
- 超分辨率重建 /
- 生成对抗网络 /
- 金字塔注意力
Abstract: An end-to-end quadruple Super-Resolution Inpainting Generative Adversarial Network (SRIGAN) is proposed in this paper, for low-resolution random occlusion face images. The generative network consists of an encoder, a feature compensation subnetwork, and a decoder constructed with a pyramid attention module. The discriminant network is an improved Patch discriminant network. The network can effectively learn the absent features of the occluded region through a feature compensation subnetwork and a two-stage training strategy. Then, the information is constructed with the decoder with a pyramid attention module and multi-scale reconstruction loss. Hence, the generative network can transform a low-resolution occlusion image into a quadruple high-resolution complete image. Furthermore, the improvements of the loss function and Patch discriminant network are employed to ensure the stability of network training and enhance the performance of the generated network. The effectiveness of the proposed algorithm is verified by comparison and module verification experiments.
- Image inpainting /
- Super-Resolution(SR) /
- Generative Adversarial Network(GAN) /
- Pyramid attention

HTML全文

图 1 超分辨率修复生成对抗网络的总体框图

下载: 全尺寸图片幻灯片

图 2 超分辨修复生成网络的基本模块

下载: 全尺寸图片幻灯片

图 3 不同训练方法和阶段的视觉效果对比

下载: 全尺寸图片幻灯片

图 4 不同修复模块的输出视觉效果对比

下载: 全尺寸图片幻灯片

图 5 SRIGAN,SR-Inpainting和Inpainting-SR级联方法的重建结果图

下载: 全尺寸图片幻灯片

1 训练流程

循环1，训练阶段1，训练编码器En，解码器De，判别网络D，迭代训练n=1,2,···,N₁：
步骤1　数据预处理得到I_gt, M, I_LR, ${\boldsymbol{I}}' $(LR遮挡人脸图像)；
步骤2　输入I_LR到En-De得到生成图像I_gen；
步骤3　冻结生成网络，以损失函数L_D优化判别网络；
步骤4　冻结判别网络，以L_G=λ_mulL_mul+λ_advL_adv+λ_perL_per+λ_styleL_style为损失函数优化生成网络；
步骤5　当n= N₁，循环1结束。
循环2，训练阶段2，训练特征补偿网络Fc；冻结编码器En，只对Fc进行优化，迭代训练n=1,2,···,N₂：
步骤1　数据预处理得到I_gt, M, I_LR, ${\boldsymbol{I}}' $；
步骤2　输入I_LR到En得到特征p；
步骤3　输入${\boldsymbol{I}}' $ 到En-Fc得到特征q；
步骤4　以损失函数L_Fc优化Fc；
步骤5　当n= N₂，循环2结束。

下载: 导出CSV

2 数据预处理

步骤1　选取HR人脸图像真值I_gt：　　　　从训练集中随机取出m个HR图像$ {\boldsymbol{I}}_{{\text{gt}}}^{(1)},{\boldsymbol{I}}_{{\text{gt}}}^{(2)}, \cdots ,{\boldsymbol{I}}_{{\text{gt}}}^{(m)} $组成一个批量。
步骤2　对I_gt使用双线性插值下采样生成LR图像真值I_LR；
步骤3　生成随机掩码M：
i=1,2,···, m迭代生成：
在[1,2]区间随机生成正整数j；
若j=1，从不规则掩码数据集随机抽取掩码；
若j=2，随机生成掩码位置坐标(a, b)，掩码边长c和d；生成对角线坐标为(a, b)和 (a+c, b+d) 的矩形掩码；
对掩码进行随机旋转、裁剪，得到M⁽ⁱ⁾
步骤4　生成LR遮挡人脸图像I'⁽ⁱ⁾= I⁽ⁱ⁾ ×M⁽ⁱ⁾

下载: 导出CSV

表 1 不同训练方法和阶段性能比较

评价指标	直接训练修复	两阶段SR修复	第1阶段SR重建
PSNR(dB)↑	21.207 0	24.860 8	25.295 9
SSIM ↑	0.724 7	0.890 3	0.906 4
MAE ↓	0.060 6	0.036 1	0.034 5

下载: 导出CSV

表 2 各模块量化对比

评价指标	解码器	多尺度解码器	完整网络
PSNR (dB)↑	20.346 0	22.986 1	24.860 8
SSIM↑	0.685 5	0.821 8	0.890 3
MAE↓	0.068 5	0.047 9	0.036 1

下载: 导出CSV

表 3 SR-Inpainting级联方法对比

	DRN-CA	DRN-EC	PAN-CA	PAN-EC	HAT-MAT	HAT-AOT	SwinIR-MAT	SwinIR-AOT	本文方法
PSNR(dB)↑	24.296 2	25.319 8	24.571 5	25.623 7	25.574 2	26.645 9	25.548 3	25.894 2	24.860 8
SSIM ↑	0.810 1	0.864 9	0.817 4	0.872 5	0.861 6	0.886 1	0.859 6	0.877 3	0.890 3
MAE↓	0.039 9	0.036 1	0.038 8	0.034 9	0.035 3	0.032 7	0.036 4	0.033 4	0.036 1

下载: 导出CSV

表 4 Inpainting-SR级联方法对比

	CA-DRN	EC-DRN	CA-PAN	EC-PAN	MAT-HAT	AOT-HAT	MAT-SwinIR	AOT-SwinIR	本文方法
PSNR(dB)↑	24.393 9	22.838 3	22.212 3	24.487 5	25.807 2	26.598 6	23.892 4	24.623 5	24.860 8
SSIM ↑	0.841 3	0.810 6	0.790 3	0.844 9	0.819 1	0.831 5	0.771 0	0.782 7	0.890 3
MAE↓	0.045 7	0.053 9	0.057 8	0.045 3	0.036 7	0.034 2	0.053 1	0.047 5	0.036 1

下载: 导出CSV

参考文献(25)

[1]	刘颖, 张艺轩, 佘建初, 等. 人脸去遮挡新技术研究综述[J]. 计算机科学与探索, 2021, 15(10): 1773–1794. doi: 10.3778/j.issn.1673-9418.2103092. LIU Ying, ZHANG Yixuan, SHE Jianchu, et al. Review of new face occlusion inpainting technology research[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(10): 1773–1794. doi: 10.3778/j.issn.1673-9418.2103092.
[2]	卢启萌, 毛晓, 凌嵘, 等. 口罩佩戴对人像鉴定的影响[J]. 中国司法鉴定, 2021(5): 89–94. doi: 10.3969/j.issn.1671-2072.2021.05.010. LU Qimeng, MAO Xiao, LING Rong, et al. Influence of mask wearing on identification of human images[J]. Chinese Journal of Forensic Sciences, 2021(5): 89–94. doi: 10.3969/j.issn.1671-2072.2021.05.010.
[3]	廖海斌, 陈友斌, 陈庆虎. 基于非局部相似字典学习的人脸超分辨率与识别[J]. 武汉大学学报:信息科学版, 2016, 41(10): 1414–1420. doi: 10.13203/j.whugis20140498. LIAO Haibin, CHEN Youbin, and CHEN Qinghu. Non-local similarity dictionary learning based super-resolution for improved face recognition[J]. Geomatics and Information Science of Wuhan University, 2016, 41(10): 1414–1420. doi: 10.13203/j.whugis20140498.
[4]	王山豹, 梁栋, 沈玲. 利用多模态注意力机制生成网络的图像修复[J]. 计算机辅助设计与图形学学报, 2023, 35(7): 1109–1121. doi: 10.3724/SP.J.1089.2023.19578. WANG Shanbao, LIANG Dong, and SHEN Ling. Image inpainting with multi-modal attention mechanism generative networks[J]. Journal of Computer-Aided Design & Computer Graphics, 2023, 35(7): 1109–1121. doi: 10.3724/SP.J.1089.2023.19578.
[5]	张子迎, 周华. 强化结构的数字壁画病害修复算法研究[J]. 系统仿真学报, 2022, 34(7): 1524–1531. doi: 10.16182/j.issn1004731x. joss.21-0034. ZHANG Ziying and ZHOU Hua. Research on inpainting algorithm of digital murals based on enhanced structural information[J]. Journal of System Simulation, 2022, 34(7): 1524–1531. doi: 10.16182/j.issn1004731x.joss.21-0034.
[6]	BARNES C, SHECHTMAN E, FINKELSTEIN A, et al. PatchMatch: A randomized correspondence algorithm for structural image editing[J]. ACM Transactions on Graphics, 2009, 28(3): 24. doi: 10.1145/1531326.1531330.
[7]	BERTALMIO M, SAPIRO G, CASELLES V, et al. Image inpainting[C]. The 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, USA, 2000: 417–424. doi: 10.1145/344779.344972.
[8]	ESEDOGLU S and SHEN Jianhong. Digital inpainting based on the Mumford–Shah–Euler image model[J]. European Journal of Applied Mathematics, 2002, 13(4): 353–370. doi: 10.1017/S0956792502004904.
[9]	YU Jiahui, LIN Zhe, YANG Jimei, et al. Generative image inpainting with contextual attention[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 5505–5514. doi: 10.1109/CVPR.2018.00577.
[10]	NAZERI K, NG E, JOSEPH T, et al. EdgeConnect: Generative image inpainting with adversarial edge learning[J]. arXiv preprint arXiv: 1901.00212, 2019.
[11]	LI Wenbo, LIN Zhe, ZHOU Kun, et al. MAT: Mask-aware transformer for large hole image inpainting[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 10758–10768. doi: 10.1109/CVPR52688.2022.01049.
[12]	ZENG Yanhong, FU Jianlong, CHAO Hongyang, et al. Aggregated contextual transformations for high-resolution image inpainting[J]. IEEE Transactions on Visualization and Computer Graphics, 2023, 29(7): 3266–3280. doi: 10.1109/TVCG.2022.3156949.
[13]	GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]. The 27th International Conference on Neural Information Processing Systems, Montréal, Canada, 2014: 1384–1393.
[14]	GUO Yong, CHEN Jian, WANG Jingdong, et al. Closed-loop matters: Dual regression networks for single image super-resolution[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 5407–5416. doi: 10.1109/CVPR42600.2020.00545.
[15]	MEI Yiqun, FAN Yuchen, ZHANG Yulun, et al. Pyramid attention network for image restoration[J]. International Journal of Computer Vision, 2023, 131(12): 3207–3225. doi: 10.1007/s11263-023-01843-5.
[16]	汪荣贵, 雷辉, 杨娟, 等. 基于自相似特征增强网络结构的图像超分辨率重建[J]. 光电工程, 2022, 49(5): 210382. doi: 10.12086/oee.2022.210382. WANG Ronggui, LEI Hui, YANG Juan, et al. Self-similarity enhancement network for image super-resolution[J]. Opto-Electronic Engineering, 2022, 49(5): 210382. doi: 10.12086/oee.2022.210382.
[17]	黄友文, 唐欣, 周斌. 结合双注意力和结构相似度量的图像超分辨率重建网络[J]. 液晶与显示, 2022, 37(3): 367–375. doi: 10.37188/CJLCD.2021-0178. HUANG Youwen, TANG Xin, and ZHOU Bin. Image super-resolution reconstruction network with dual attention and structural similarity measure[J]. Chinese Journal of Liquid Crystals and Displays, 2022, 37(3): 367–375. doi: 10.37188/CJLCD.2021-0178.
[18]	CHEN Xiangyu, WANG Xintao, ZHOU Jiantao, et al. Activating more pixels in image super-resolution transformer[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, Canada, 2023: 22367–22377. doi: 10.1109/CVPR52729.2023.02142.
[19]	LIANG Jingyun, CAO Jiezhang, SUN Guolei, et al. SwinIR: Image restoration using swin transformer[C]. 2021 IEEE/CVF International Conference on Computer Vision Workshops, Montreal, Canada, 2021: 1833–1844. doi: 10.1109/ICCVW54120.2021.00210.
[20]	ARJOVSKY M, CHINTALA S, and BOTTOU L. Wasserstein GAN[J]. arXiv preprint arXiv: 1701.07875, 2017.
[21]	GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of wasserstein GANs[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 5767–5777.
[22]	MIYATO T, KATAOKA T, KOYAMA M, et al. Spectral normalization for generative adversarial networks[J]. arXiv preprint arXiv: 1802.05957, 2018.
[23]	ISOLA P, ZHU Junyan, ZHOU Tinghui, et al. Image-to-image translation with conditional adversarial networks[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 1125–1134. doi: 10.1109/CVPR.2017.632.
[24]	JOHNSON J, ALAHI A, and FEI-FEI L. Perceptual losses for real-time style transfer and super-resolution[C]. The 14th European Conference on Computer Vision, Amsterdam, the Netherlands, 2016: 694–711. doi: 10.1007/978-3-319-46475-6_43.
[25]	LIU Guilin, REDA F A, SHIH K J, et al. Image inpainting for irregular holes using partial convolutions[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 85–100. doi: 10.1007/978-3-030-01252-6_6.