高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

CLIP视觉语义驱动的图像去雨模型

孙瑾 崔云通 田宏伟 黄长城 汪纪钢

孙瑾, 崔云通, 田宏伟, 黄长城, 汪纪钢. CLIP视觉语义驱动的图像去雨模型[J]. 电子与信息学报. doi: 10.11999/JEIT251066
引用本文: 孙瑾, 崔云通, 田宏伟, 黄长城, 汪纪钢. CLIP视觉语义驱动的图像去雨模型[J]. 电子与信息学报. doi: 10.11999/JEIT251066
SUN Jin, CUI Yuntong, TIAN Hongwei, HUANG Changcheng, WANG Jigang. Image Deraining Driven by CLIP Visual Embedding[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251066
Citation: SUN Jin, CUI Yuntong, TIAN Hongwei, HUANG Changcheng, WANG Jigang. Image Deraining Driven by CLIP Visual Embedding[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251066

CLIP视觉语义驱动的图像去雨模型

doi: 10.11999/JEIT251066 cstr: 32379.14.JEIT251066
基金项目: 国家自然科学基金(61702260)
详细信息
    作者简介:

    孙瑾:女,副教授,研究方向为计算机视觉,图像处理与分析

    崔云通:男,硕士生,研究方向为图像增强与复原

    田宏伟:男,硕士生,研究方向为图像处理与分析,目标跟踪

    黄长城:男,硕士生,研究方向为图像增强与复原

    汪纪钢:男,硕士生,研究方向为图像增强与复原

    通讯作者:

    孙瑾 sunjinly@nuaa.edu.cn

  • 中图分类号: TP391.4

Image Deraining Driven by CLIP Visual Embedding

Funds: National Natural Science Foundation of China (61702260)
  • 摘要: 图像去雨是计算机视觉领域的基础任务,现有方法过度依赖假设雨模型或合成雨数据集,导致真实场景去雨效果泛化性能不足。该文分析发现CLIP模型图像编码器对雨纹干扰的鲁棒性,将去雨任务转化为基于视觉语义引导的像素级回归问题,提出基于冻结对比语言-图像预训练(Frozen Contrastive Language–Image Pretraining, FCLIP)策略的图像去雨模型FCLIP-UNet。该模型采用对称的编码器解码器结构(U-Net):编码器截取CLIP-RN50图像编码器的4层结构实现雨纹与图像内容语义的自动解耦;解码阶段采用ConvNeXt-T与UpDWBlock串行结构,结合跳跃连接中嵌入层级差异化扰动策略实现高层语义引导下的细节恢复与泛化能力的协同增强。定量和定性实验结果表明,FCLIP-UNet在公开合成数据集和真实雨图数据集上均取得最优或具有竞争力的性能,并在包含真实雨图的多个独立数据集上表现出良好的泛化性能。
  • 图  1  FCLIP-UNet网络结构

    图  2  图像语义特征与文本相似度(热力图)及分类概率(柱状图)

    图  3  不同密度雨纹下5种CLIP ResNet编码器提取特征分析

    图  4  解码器结构

    图  5  Test2800测试数据集图例去雨效果

    图  6  Test1200测试数据集图例去雨效果

    图  7  不同方法在多数据集的泛化能力比较

    图  8  真实雨图实例一去雨结果

    图  10  真实雨图实例三去雨结果

    图  9  真实雨图实例二去雨结果

    表  1  Test1200数据集图像与细粒度文本标签匹配情况

    图像 文本标签
    light rain moderate rain heavy rain
    低密度雨纹图像 196 97 107
    中等密度雨纹图像 68 188 144
    高密度雨纹图像 95 157 148
    下载: 导出CSV

    表  2  Rain13K数据集组成

    数据集来源Rain800Rain100HRain100LRain14000Rain1200Rain12总计
    训练样本数700180001120001213712
    测试样本数1001001002800120004300
    测试集名称Test100Rain100HRain100LTest2800Test1200--
    下载: 导出CSV

    表  3  Rain13K 数据集上对比实验结果

    算法 Test100 Rain100H Rain100L Test2800 Test1200 Average
    PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM PSNR SSIM
    DerainNet[4] 22.77 0.810 14.92 0.592 27.03 0.884 24.31 0.861 23.38 0.835 22.48 0.796
    DID-MDN[24] 22.56 0.818 17.35 0.524 25.23 0.741 28.13 0.867 29.95 0.901 24.58 0.770
    RESCAN[26] 25.00 0.835 26.36 0.785 29.80 0.881 31.29 0.904 30.51 0.882 28.59 0.857
    MSPFN[27] 27.50 0.876 28.66 0.860 32.40 0.933 32.82 0.930 32.39 0.916 30.75 0.903
    MPRNet[21] 30.27 0.897 30.41 0.890 36.40 0.965 33.64 0.938 32.91 0.916 32.73 0.921
    Uformer-B[28] 29.90 0.906 30.31 0.900 36.86 0.972 33.53 0.939 29.45 0.903 32.01 0.924
    IDT[11] 29.69 0.905 29.95 0.898 37.01 0.971 33.38 0.937 31.38 0.908 32.28 0.924
    DCTR[29] 30.91 0.912 30.74 0.892 38.19 0.974 33.89 0.941 33.57 0.926 33.46 0.929
    AFENet[30] 30.51 0.918 31.22 0.901 37.66 0.978 33.13 0.925 33.82 0.944 33.27 0.933
    DPCNet[31] 30.59 0.914 30.73 0.899 37.96 0.974 33.23 0.928 33.87 0.941 33.28 0.931
    FCLIP-UNet(Ours) 31.23 0.924 30.82 0.903 38.06 0.972 34.03 0.943 33.27 0.928 33.48 0.934
    注:加粗字体为每列最优值,下划线为每列次优值(本文其他表格设置相同)。
    下载: 导出CSV

    表  4  跨数据集上的对比实验结果

    算法SPA-DataHQ-RAINMPID
    PSNRSSIMPSNRSSIMPSNRSSIM
    DID-MDN31.120.93723.620.64023.090.794
    RESCAN34.570.95823.790.51926.740.823
    MSPFN34.550.96123.990.57227.480.849
    MPRNet35.160.95426.360.68131.530.896
    Uformer-B35.030.94826.670.68531.470.893
    IDT35.610.95726.880.67931.630.899
    DCTR35.870.96327.330.68431.810.905
    DPCNet35.640.95828.560.78631.590.894
    FCLIP-UNet(Ours)36.390.96730.360.85831.960.913
    下载: 导出CSV

    表  5  NTURain-R上无参考图像质量评价指标对比结果

    算法InputDerainNetDID-MDNRESCANMSPFNMPRNetUformer-BIDTDCTRDPCNetOurs
    NIQE6.2116.4325.9885.7454.8734.8344.4734.3524.5334.3784.286
    BRISQUE33.15631.16730.89631.76629.86630.65128.76827.37826.24526.50925.676
    下载: 导出CSV

    表  6  CLIP不同编码器Rain100H上消融实验结果

    编码器PSNR/dBSSIM
    ResNet5016.530.546
    CLIP-ViT-B/3229.780.878
    CLIP-ViT-B/1630.250.885
    CLIP-RN5030.820.903
    下载: 导出CSV

    表  7  CLIP不同编码器FLOPs和推理速度对比

    CLIP编码器RN50ViT-B/32ViT-B/16
    FLOPs2.36×10102.43×10119.76×1011
    推理速度(s/frame)0.230.561.06
    下载: 导出CSV

    表  8  Rain100H消融实验结果

    网络UpDWBlockLDFPSPSNR/dBSSIM
    N1--30.020.879
    N2-30.420.893
    N3-30.290.887
    N430.820.903
    下载: 导出CSV

    表  9  不同损失函数的消融实验

    损失函数 Lmse L1 Lmse+Lp L1+Lp
    PSNR/dB 29.44 29.76 29.39 30.82
    SSIM 0.880 0.892 0.885 0.903
    下载: 导出CSV

    表  10  不同λp取值的消融实验

    λp 0.1 1 10
    PSNR/dB 29.51 30.82 29.66
    SSIM 0.884 0.903 0.887
    下载: 导出CSV

    表  11  不同强度(σi)扰动策略消融实验

    不同扰动策略PSNR/dBSSIM
    等强度低扰动(σ1=σ2=σ3=σ4=0.01)30.630.892
    等强度高扰动(σ1=σ2=σ3=σ4=0.1)30.580.887
    扰动强度逐层降低
    (σ1=0.1, σ2=0.06, σ3=0.03, σ4=0.01)
    30.110.880
    扰动强度逐层增加
    (σ1=0.01, σ2=0.03, σ3=0.06, σ4=0.1)
    30.820.903
    下载: 导出CSV
  • [1] LI Yufeng, LU Jiyang, CHEN Hongming, et al. Dilated convolutional transformer for high-quality image deraining[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Vancouver, Canada, 2023: 4199–4207. doi: 10.1109/CVPRW59228.2023.00442.
    [2] KANG Liwei, LIN C W, and FU Y H. Automatic single-image-based rain streaks removal via image decomposition[J]. IEEE Transactions on Image Processing, 2012, 21(4): 1742–1755. doi: 10.1109/TIP.2011.2179057.
    [3] ZHU Lei, FU C W, LISCHINSKI D, et al. Joint Bi-layer optimization for single-image rain streak removal[C]. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 2017: 2545–2553. doi: 10.1109/ICCV.2017.276.
    [4] FU Xueyang, HUANG Jiabin, DING Xinghao, et al. Clearing the skies: A deep network architecture for single-image rain removal[J]. IEEE Transactions on Image Processing, 2017, 26(6): 2944–2956. doi: 10.1109/TIP.2017.2691802.
    [5] FU Xueyang, HUANG Jiabin, ZENG Delu, et al. Removing rain from single images via a deep detail network[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, USA, 2017: 1715–1723. doi: 10.1109/CVPR.2017.186.
    [6] 梅天灿, 曹敏, 杨宏, 等. 基于密度分类引导的双阶段雨天图像复原方法[J]. 电子与信息学报, 2023, 45(4): 1383–1390. doi: 10.11999/JEIT220157.

    MEI Tiancan, CAO Min, YANG Hong, et al. Two-stage rain image removal based on density guidance[J]. Journal of Electronics & Information Technology, 2023, 45(4): 1383–1390. doi: 10.11999/JEIT220157.
    [7] REN Dongwei, ZUO Wangmeng, HU Qinghua, et al. Progressive image deraining networks: A better and simpler baseline[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 3932–3941. doi: 10.1109/CVPR.2019.00406.
    [8] WEI Wei, MENG Deyu, ZHAO Qian, et al. Semi-supervised transfer learning for image rain removal[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 3872–3881. doi: 10.1109/CVPR.2019.00400.
    [9] YASARLA R, SINDAGI V A, and PATEL V M. Syn2real transfer learning for image deraining using gaussian processes[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2020: 2723–2733. doi: 10.1109/CVPR42600.2020.00280.
    [10] JIANG Kui, WANG Zhongyuan, CHEN Chen, et al. Magic ELF: Image deraining meets association learning and transformer[C]. Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 2022: 827–836. doi: 10.1145/3503161.3547760.
    [11] XIAO Jie, FU Xueyang, LIU Aiping, et al. Image de-raining transformer[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(11): 12978–12995. doi: 10.1109/TPAMI.2022.3183612.
    [12] CUI Yuning, REN Wenqi, CAO Xiaochun, et al. Revitalizing convolutional network for image restoration[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 9423–9438. doi: 10.1109/TPAMI.2024.3419007.
    [13] RADFORD A, KIM J W, HALLACY C, et al. Learning transferable visual models from natural language supervision[C]. Proceedings of the 38th International Conference on Machine Learning, 2021: 8748–8763. (查阅网上资料, 未找到本条文献出版地信息, 请确认并补充).
    [14] MA Wenxin, ZHANG Xu, YAO Qingsong, et al. AA-CLIP: Enhancing zero-shot anomaly detection via anomaly-aware CLIP[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2025: 4744–4754. doi: 10.1109/CVPR52734.2025.00447.
    [15] SUN Zeyi, FANG Ye, WU Tong, et al. Alpha-CLIP: A CLIP model focusing on wherever you want[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 13019–13029. doi: 10.1109/CVPR52733.2024.01237.
    [16] WANG Mengmeng, XING Jiazheng, JIANG Boyuan, et al. A multimodal, multi-task adapting framework for video action recognition[C]. Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada: AAAI, 2024: 5517–5525. doi: 10.1609/aaai.v38i6.28361.
    [17] LUO Ziwei, GUSTAFSSON F K, ZHAO Zheng, et al. Controlling vision-language models for multi-task image restoration[C]. Proceedings of the 12th International Conference on Learning Representations, Vienna, Austria, 2024.
    [18] LIN Jingbo, ZHANG Zhilu, WEI Yuxiang, et al. Improving image restoration through removing degradations in textual representations[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 2866–2878. doi: 10.1109/CVPR52733.2024.00277.
    [19] 文渊博, 高涛, 安毅生, 等. 基于视觉提示学习的天气退化图像恢复[J]. 计算机学报, 2024, 47(10): 2401–2416. doi: 10.11897/SP.J.1016.2024.02401.

    WEN Yuanbo, GAO Tao, AN Yisheng, et al. Weather-degraded image restoration based on visual prompt learning[J]. Chinese Journal of Computers, 2024, 47(10): 2401–2416. doi: 10.11897/SP.J.1016.2024.02401.
    [20] WANG Ruiyi, LI Wenhao, LIU Xiaohong, et al. HazeCLIP: Towards language guided real-world image dehazing[C]. ICASSP 2025–2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 2025: 1–5. doi: 10.1109/ICASSP49660.2025.10889509.
    [21] CHENG Jun, LIANG Dong, and TAN Shan. Transfer CLIP for generalizable image denoising[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA, 2024: 25974–25984. doi: 10.1109/CVPR52733.2024.02454.
    [22] LIU Zhuang, MAO Hanzi, WU Chaoyuan, et al. A ConvNet for the 2020s[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 11966–11976. doi: 10.1109/CVPR52688.2022.01167.
    [23] ZAMIR S W, ARORA A, KHAN S, et al. Multi-stage progressive image restoration[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 14816–14826. doi: 10.1109/CVPR46437.2021.01458.
    [24] ZHANG He and PATEL V M. Density-aware single image de-raining using a multi-stream dense network[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 695–704. doi: 10.1109/CVPR.2018.00079.
    [25] ZHOU Tianfei, YUAN Ye, WANG Binglu, et al. Federated feature augmentation and alignment[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024, 46(12): 11119–11135. doi: 10.1109/TPAMI.2024.3457751.
    [26] LI Xia, WU Jianlong, LIN Zhouchen, et al. Recurrent squeeze-and-excitation context aggregation net for single image deraining[C]. Proceedings of the 15th European Conference on Computer Vision (ECCV), Munich, Germany, 2018: 262–277. doi: 10.1007/978-3-030-01234-2_16.
    [27] JIANG Kui, LIU Wenxuan, WANG Zheng, et al. DAWN: Direction-aware attention wavelet network for image deraining[C]. Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, Canada, 2023: 7065–7074. doi: 10.1145/3581783.3611697.
    [28] WANG Zhendong, CUN Xiaodong, BAO Jianmin, et al. Uformer: A general U-shaped transformer for image restoration[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 17662–17672. doi: 10.1109/CVPR52688.2022.01716.
    [29] LI Yufeng, LU Jiyang, CHEN Hongming, et al. Dilated convolutional transformer for high-quality image deraining[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Vancouver, Canada, 2023: 4199–4207. doi: 10.1109/CVPRW59228.2023.00442. (查阅网上资料,本条文献与第1条文献重复,请确认).
    [30] YAN Fei, HE Yuhong, CHEN Keyu, et al. Adaptive frequency enhancement network for single image deraining[C]. 2024 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Kuching, Malaysia, 2024: 4534–4541. doi: 10.1109/SMC54092.2024.10831025.
    [31] HE Yuhong, JIANG Aiwen, JIANG Lingfang, et al. Dual-path coupled image deraining network via spatial-frequency interaction[C]. 2024 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 2024: 1452–1458. doi: 10.1109/ICIP51287.2024.10647753.
  • 加载中
图(10) / 表(11)
计量
  • 文章访问数:  54
  • HTML全文浏览量:  35
  • PDF下载量:  2
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-10-09
  • 修回日期:  2026-02-05
  • 录用日期:  2026-02-05
  • 网络出版日期:  2026-02-13

目录

    /

    返回文章
    返回