高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于动态感受野的自适应多尺度信息融合的图像转换

尹梦晓 林振峰 杨锋

尹梦晓, 林振峰, 杨锋. 基于动态感受野的自适应多尺度信息融合的图像转换[J]. 电子与信息学报, 2021, 43(8): 2386-2394. doi: 10.11999/JEIT200675
引用本文: 尹梦晓, 林振峰, 杨锋. 基于动态感受野的自适应多尺度信息融合的图像转换[J]. 电子与信息学报, 2021, 43(8): 2386-2394. doi: 10.11999/JEIT200675
Mengxiao YIN, Zhenfeng LIN, Feng YANG. Adaptive Multi-scale Information Fusion Based on Dynamic Receptive Field for Image-to-image Translation[J]. Journal of Electronics & Information Technology, 2021, 43(8): 2386-2394. doi: 10.11999/JEIT200675
Citation: Mengxiao YIN, Zhenfeng LIN, Feng YANG. Adaptive Multi-scale Information Fusion Based on Dynamic Receptive Field for Image-to-image Translation[J]. Journal of Electronics & Information Technology, 2021, 43(8): 2386-2394. doi: 10.11999/JEIT200675

基于动态感受野的自适应多尺度信息融合的图像转换

doi: 10.11999/JEIT200675
基金项目: 国家自然科学基金(61762007, 61861004),广西自然科学基金(2017GXNSFAA198269, 2017GXNSFAA198267)
详细信息
    作者简介:

    尹梦晓:女,1978年生,博士,副教授,CCF会员,研究方向为计算机图形学与虚拟现实、数字几何处理、图像与视频编辑

    林振峰:男,1996年生,硕士生,研究方向为图像生成、图像转换

    杨锋:男,1979年生,博士,副教授,CCF会员,研究方向为人工智能、网络信息安全、大数据与高性能计算、精准医学

    通讯作者:

    杨锋 yf@gxu.edu.cn

  • 中图分类号: TN911.73; TP391

Adaptive Multi-scale Information Fusion Based on Dynamic Receptive Field for Image-to-image Translation

Funds: The National Natural Science Foundation of China (61762007, 61861004), The Natural Science Foundation of Guangxi (2017GXNSFAA198269, 2017GXNSFAA198267)
  • 摘要: 为提高图像转换模型生成图像的质量,该文针对转换模型中的生成器进行改进,同时探究多样化的图像转换,拓展转换模型的生成能力。在生成器的改进方面,利用选择性(卷积)核模块(SKBlock)的动态感受野机制获取和融合生成器中每个上采样特征的多尺度信息,借助特征的多尺度信息和动态感受野构造选择性(卷积)核的生成式对抗网络(SK-GAN)。与传统生成器相比,SK-GAN以动态感受野获取多尺度信息的生成结构提高了生成图像的质量。在多样化图像转换方面,基于SK-GAN在草图合成真实图像任务提出带引导图像的选择性(卷积)核的生成式对抗网络(GSK-GAN)。该模型利用引导图像指导源图像的转换,通过引导图像编码器提取引导图像特征,然后由参数生成器(PG)和特征转换层(FT)将引导图像特征的信息传递至生成器。此外,该文还提出双分支引导图像编码器以提高转换模型的编辑能力,以及利用引导图像的隐变量分布实现随机样式的图像生成。实验表明,改进后的生成器有助于提高生成图像质量,SK-GAN在多个数据集中获得合理的生成结果。GSK-GAN不仅保证了生成图像的质量,还能生成更多样式的图像。
  • 图  1  转换模型结构

    图  2  生成器中的上采样过程

    图  3  SKBlock的结构和动态特征选择过程

    图  4  GSK-GAN模型结构

    $\mu $和$\sigma $分别为引导图像隐变量分布均值和标准差,$z$为隐变量,$ \odot $表示沿通道方向拼接特征。

    图  5  引导图像信息的传递方式

    图  6  草图合成真实图像实验结果对比

    图  7  语义图像合成真实图像实验结果对比

    图  8  多模态图像转换生成的结果对比

    图  9  Edges2shoes数据集中使用双分支引导图像编码器的生成结果

    图  10  Edges2shoes数据集中使用隐变量的生成结果

    图  11  Edges2shoes数据集中纹理不匹配的生成结果

    图  12  多个数据集中上采样层的特征对应的多尺度信息的选择权重

    图  13  不同引导图像信息传递方式对应的多样性生成结果

    表  1  Edges2shoes和Edges2handbags数据集中定量对比结果

    Edges2shoesEdges2handbags
    Pix2pix[1]DRPAN[7]SK-GANPix2pix[1]DRPAN[7]SK-GAN
    SSIM0.7490.7640.7880.6410.6710.676
    PSNR20.00119.73920.60616.47517.38417.171
    FID69.21343.88345.16873.67569.60668.957
    LPIPS0.1830.1760.1610.2670.2600.254
    下载: 导出CSV

    表  2  Cityscapes数据集中定量对比结果

    Per-pixel accPer-class accClass IOU
    L1+CGAN[1]0.630.210.16
    CRN[22]0.690.210.20
    DPRAN[7]0.730.240.19
    SK-GAN0.760.250.20
    下载: 导出CSV

    表  3  多模态图像转换Edges2shoes和Edges2handbags数据集中定量对比结果

    Edges2shoesEdges2handbags
    TextureGAN[9]文献[10]GSK-GANTextureGAN[9]文献[10]GSK-GAN
    FID44.190118.98845.04161.06873.29060.753
    LPIPS0.1230.1230.1190.1710.1620.154
    下载: 导出CSV

    表  4  生成器中不同的上采样过程生成的图像质量对比结果

    SSIMPSNRFIDLPIPS
    模式10.26712.821102.7710.415
    模式20.26712.85392.6080.404
    模式30.28412.98189.7180.405
    模式3 (GAN)0.26212.56897.8280.399
    下载: 导出CSV

    表  5  SKBlock中不同感受野分支组合对应的图像质量对比结果

    SSIMPSNRFIDLPIPS
    K130.27612.961100.5320.398
    K350.28412.98189.7180.405
    K570.26813.00798.1320.400
    下载: 导出CSV
  • [1] ISOLA P, ZHU Junyan, ZHOU Tinghui, et al. Image-to-image translation with conditional adversarial networks[C]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, USA, 2017: 5967–5976. doi: 10.1109/CVPR.2017.632.
    [2] CHEN Wengling and HAYS J. SketchyGAN: Towards diverse and realistic sketch to image synthesis[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 9416–9425. doi: 10.1109/CVPR.2018.00981.
    [3] KINGMA D P and WELLING M. Auto-encoding variational Bayes[EB/OL]. https://arxiv.org/abs/1312.6114, 2013.
    [4] GOODFELLOW I J, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]. The 27th International Conference on Neural Information Processing Systems, Montreal, Canada, 2014: 2672–2680.
    [5] RADFORD A, METZ L, and CHINTALA S. Unsupervised representation learning with deep convolutional generative adversarial networks[EB/OL]. https://arxiv.org/abs/1511.06434, 2015.
    [6] SUNG T L and LEE H J. Image-to-image translation using identical-pair adversarial networks[J]. Applied Sciences, 2019, 9(13): 2668. doi: 10.3390/app9132668
    [7] WANG Chao, ZHENG Haiyong, YU Zhibin, et al. Discriminative region proposal adversarial networks for high-quality image-to-image translation[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 796–812. doi: 10.1007/978-3-030-01246-5_47.
    [8] ZHU Junyan, ZHANG R, PATHAK D, et al. Toward multimodal image-to-image translation[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 465–476.
    [9] XIAN Wenqi, SANGKLOY P, AGRAWAL V, et al. TextureGAN: Controlling deep image synthesis with texture patches[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8456–8465. doi: 10.1109/CVPR.2018.00882.
    [10] ALBAHAR B and HUANG Jiabin. Guided image-to-image translation with bi-directional feature transformation[C]. The 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), 2019: 9015–9024. doi: 10.1109/ICCV.2019.00911.
    [11] SUN Wei and WU Tianfu. Learning spatial pyramid attentive pooling in image synthesis and image-to-image translation[EB/OL]. https://arxiv.org/abs/1901.06322, 2019.
    [12] ZHU Junyan, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 2242–2251. doi: 10.1109/ICCV.2017.244.
    [13] LI Xiang, WANG Wenhai, HU Xiaolin, et al. Selective kernel networks[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019: 510–519. doi: 10.1109/CVPR.2019.00060.
    [14] SZEGEDY C, IOFFE S, VANHOUCKE V, et al. Inception-v4, inception-ResNet and the impact of residual connections on learning[EB/OL]. https://arxiv.org/abs/1602.07261, 2016.
    [15] 柳长源, 王琪, 毕晓君. 基于多通道多尺度卷积神经网络的单幅图像去雨方法[J]. 电子与信息学报, 2020, 42(9): 2285–2292. doi: 10.11999/JEIT190755

    LIU Changyuan, WANG Qi, and BI Xiaojun. Research on Rain Removal Method for Single Image Based on Multi-channel and Multi-scale CNN[J]. Journal of Electronics &Information Technology, 2020, 42(9): 2285–2292. doi: 10.11999/JEIT190755
    [16] LI Juncheng, FANG Faming, MEI Kangfu, et al. Multi-scale residual network for image super-resolution[C]. The 15th European Conference on Computer Vision, Munich, Germany, 2018: 527–542. doi: 10.1007/978-3-030-01237-3_32.
    [17] MAO Xudong, LI Qing, XIE Haoran, et al. Least squares generative adversarial networks[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 2813–2821. doi: 10.1109/ICCV.2017.304.
    [18] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. Gans trained by a two time-scale update rule converge to a local nash equilibrium[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6629–6640.
    [19] ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric[C]. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 586–595. doi: 10.1109/CVPR.2018.00068.
    [20] CORDTS M, OMRAN M, RAMOS S, et al. The cityscapes dataset for semantic urban scene understanding[C]. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, 2016: 3213–3223. doi: 10.1109/CVPR.2016.350.
    [21] TYLEČEK R and ŠÁRA R. Spatial pattern templates for recognition of objects with regular structure[C]. The 35th German Conference on Pattern Recognition, Saarbrücken, Germany, 2013: 364–374. doi: 10.1007/978-3-642-40602-7_39.
    [22] CHEN Qifeng and KOLTUN V. Photographic image synthesis with cascaded refinement networks[C]. 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017: 1520–1529. doi: 10.1109/ICCV.2017.168.
  • 加载中
图(13) / 表(5)
计量
  • 文章访问数:  1559
  • HTML全文浏览量:  729
  • PDF下载量:  111
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-08-04
  • 修回日期:  2021-01-04
  • 网络出版日期:  2021-01-10
  • 刊出日期:  2021-08-10

目录

    /

    返回文章
    返回