Defending Deepfakes by Attribute-Aware Attack
-
摘要: 人脸深度伪造的非法滥用,会导致严重的人身财产损害。基于梯度攻击的传统防御方法,虽然在伪造模型参数已知的白盒场景下,通过多轮迭代能取得一定的防御效果,但是面对实际的黑盒攻击时,往往逊色于目前主流的基于生成对抗网络(GAN)跨模型集成训练的方法。即使GAN能够进行快速推理,但其扰动生成缺乏感知约束,隐蔽性较差,难以满足实际需求。此外,日新月异的人脸伪造模型则给对抗扰动的跨模型防御的可迁移性提出了更高要求。为此,该文提出一种属性感知的对抗样本生成方法,旨在改善扰动隐蔽性的同时,达到兼顾跨模型防御性能的目的。一方面,该文在仅考虑人脸图像前景的基础上,使用属性显著掩码划分出的面部与发型区域,通过自适应扰动生成器生成特异性的对抗扰动,有效平衡了对抗样本的隐蔽性与攻击性。另一方面,该文从数据增强角度出发,通过融合参考人脸图像的频域相位信息,生成更具多样性的输入特征,防范扰动过拟合的同时提升可迁移性能。实验结果表明,所提方法在跨模型防御的定量和定性测试中均取得较好的防御性能。Abstract:
Objective The illegal misuse of deepfakes can seriously cause personal property damage. To prevent the spread of forged images, existing methods often employ adversarial examples to protect facial images from manipulation by deepfakes. However, traditional gradient-based attacks suffer from low generalization and poor generation efficiency in black-box attack scenarios, lagging behind current methods that use generative adversarial networks (GAN) to train cross-model defensive examples. Although GAN-based methods enable fast inference, the lack of perceptual constraints often makes the generated adversarial perturbations visually noticeable. Moreover, the rapid evolution of deepfake models imposes higher requirements on the generalization ability of adversarial examples. Therefore, developing imperceptible and generalizable adversarial attack methods is crucial for proactive defense against deepfakes. Methods To further improve the transferability and imperception of adversarial examples generated by existing methods, this paper proposes an attribute-aware adversarial example generation method for deepfakes defense. The proposed method aims to generate imperceptible perturbations while enhancing cross-model generalization through a random identity fusion mechanism. Specifically, by focusing on the foreground regions of facial images, we introduce attribute-aware salient segmentation of facial and hairstyle regions and combine it with adaptive spatial-frequency attention to generate region-specific adversarial perturbations. This strategy not only effectively improves the imperceptibility of adversarial examples but also reduces the additional computational overhead caused by global processing. Furthermore, from the perspective of data augmentation, this paper utilizes phase swapping in the frequency domain to fuse identity-related features from reference face image, preventing overfitting of perturbations while improving generalization performance. Results and Discussions The method is trained and tested on the CelebA-HQ dataset for proxy models. Compared with existing proactive defense methods, experimental results show that the proposed method can generate adversarial examples with strong imperceptibility and cross-model defense capabilities, achieving a high defense success rate against various proxy models. The average PSNR of the adversarial perturbation forged output can be reduced to 16.79 dB, which is about 1.87% higher than the suboptimal method. The defense performance against HiSD is significantly improved, by about 7.5% compared to the suboptimal method. The defense performance against AttGAN is about 12.7% higher than the suboptimal GAN-based defense method. Moreover, the LPIPS index of the method indicates that the objective perturbations have high imperceptibility. Conclusions In this study, a facial attribute-aware attack method is proposed for deepfakes defense. It incorporates a frequency-domain fusion mechanism to enhance the diversity of adversarial feature inputs. In addition, adaptive perturbation generators are designed to extract local facial information and dynamically adjust the adversarial features. This enables the method to preserve imperceptible yet attack-effective perturbation components, thereby achieving strong cross-model generalization performance. Future work will focus on developing more proactive defense methods against deepfakes with improved imperceptibility and generalization, especially for cross-model transfer attack scenarios. -
表 1 在CelebA-HQ测试集上的主动防御性能的定量结果对比(粗体表示最优,下划线表示次优)
对抗
样本隐蔽性能 主动防御性能 推理
时间
/s攻击成功率(%) 扰动输出 PSNR↑ LPIPS↓ FGAN AttGAN HiSD StarGAN AGAN PSNR↓ LPIPS↑ 非
集
成LAE[13][14] 17.10 0.2551 78.0 81.0 85.0 100.0 91.0 16.72 0.2700 3.020 D-D[7] 35.40 0.0525 99.8 0.0 0.0 100.0 98.2 24.84 0.2163 1.040 D-D(SA)[7][18] 37.86 0.0188 98.4 0.0 0.0 100.0 97.0 30.74 0.2072 1.220 A-F[8] 40.86 0.0036 100.0 0.0 0.0 60.0 100.0 26.15 0.1909 9.130 跨
模
型
集成SA[18][14] 32.11 0.0814 98.0 27.0 85.0 93.0 99.0 18.37 0.2272 5.470 CMUA[9][14] 32.45 0.1678 87.0 97.0 23.0 91.0 99.0 20.53 0.3306 - FOUND[11][14] 33.23 0.1441 73.0 91.0 44.0 100.0 100.0 17.38 0.3688 - AdvGAN[10] 30.35 0.0692 96.2 0.0 80.0 100.0 98.0 17.11 0.3022 0.116 PD-DWT[14] 32.91 0.0353 95.0 74.0 73.0 100.0 99.0 17.75 0.2871 0.124 本文 32.62 0.0435 97.4 83.4 91.4 100.0 100.0 16.79 0.2849 0.167 表 2 对抗样本生成方法在不同来源测试集上迁移攻击的防御性能定量结果对比
数据集 对抗样本 隐蔽性能 主动防御性能 攻击成功率(%) 扰动输出 PSNR↑ SSIM↑ FGAN AttGAN HiSD StarGAN AGAN PSNR↓ SSIM↓ FFHQ AdvGAN[10] 30.318 0.7835 100.0 0.0 77.2 100.0 100.0 18.676 0.6495 PD-DWT[14] 32.570 0.8637 99.0 87.6 78.2 99.6 100.0 17.142 0.6529 本文 32.352 0.8578 100.0 84.8 92.4 100.0 100.0 16.527 0.6241 LFW AdvGAN[10] 30.405 0.7698 99.6 0.0 79.2 100.0 100.0 17.536 0.6081 PD-DWT[14] 33.591 0.8666 99.6 89.6 78.0 99.6 100.0 18.527 0.7079 本文 33.441 0.8923 100.0 84.0 87.6 99.2 100.0 17.518 0.6655 FF++ AdvGAN[10] 30.305 0.7457 100.0 0.0 93.8 99.7 100.0 16.439 0.5998 PD-DWT[14] 32.736 0.8663 100.0 100.0 80.9 100.0 100.0 17.710 0.6831 本文 32.684 0.8655 100.0 87.8 85.3 100.0 100.0 16.698 0.6521 表 3 本文方法的跨模型集成防御消融实验结果
对抗
样本隐蔽性能 主动防御性能 攻击成功率(%) 扰动输出 PSNR↑ LPIPS↓ StarGAN FGAN AttGAN HiSD AGAN PSNR↓ LPIPS↑ 无①②③ 25.55 0.2299 100.0 100.0 100.0 80.0 100.0 16.49 0.4741 无①② 26.83 0.1303 100.0 99.8 95.6 87.4 99.8 15.62 0.4316 无①③ 26.08 0.1500 100.0 99.4 100.0 86.2 100.0 14.95 0.4471 无②③ 32.31 0.0689 99.0 100.0 43.0 63.2 99.8 17.92 0.3299 无① 26.10 0.1592 100.0 97.6 100.0 89.4 100.0 14.54 0.4805 无② 32.56 0.0459 99.6 96.2 70.0 95.4 100.0 17.03 0.3204 无③ 32.16 0.0471 99.2 97.6 86.2 84.6 100.0 17.70 0.3102 完整方法 32.62 0.0435 100.00 97.4 83.4 91.4 100.0 16.79 0.2849 -
[1] PENG Chunlei, LUO Xiaoyi, LIU Decheng, et al. Semantic token transformer for face forgery detection[J]. IEEE Transactions on Information Forensics and Security, 2025, 20: 4904–4914. doi: 10.1109/TIFS.2025.3567110. [2] 刘鹏宇, 郑添阳, 董敏. 一种伪造注意图驱动的多任务深伪视频检测模型[J]. 电子与信息学报, 2026, 48(1): 346–358. doi: 10.11999/JEIT250926.LIU Pengyu, ZHENG Tianyang, and DONG Min. A fake attention map-driven multi-task deepfake video detection model[J]. Journal of Electronics & Information Technology, 2026, 48(1): 346–358. doi: 10.11999/JEIT250926. [3] GOODFELLOW I J, SHLENS J, and SZEGEDY C. Explaining and harnessing adversarial examples[C]. 3rd International Conference on Learning Representations, San Diego, USA, 2015. doi: 10.48550/arXiv.1412.6572. [4] LIU Decheng, SU Qixuan, PENG Chunlei, et al. Imperceptible face forgery attack via adversarial semantic mask[EB/OL]. https://doi.org/10.48550/arXiv.2406.10887, 2024. [5] DEB D, ZHANG Jianbang, and JAIN A K. AdvFaces: Adversarial face synthesis[C]. Proceedings of 2020 IEEE International Joint Conference on Biometrics (IJCB), Houston, USA, 2020: 1–10. doi: 10.1109/IJCB48548.2020.9304898. [6] 瞿左珉, 殷琪林, 盛紫琦, 等. 人脸深度伪造主动防御技术综述[J]. 中国图象图形学报, 2024, 29(2): 318–342. doi: 10.11834/jig.230128.QU Zuomin, YIN Qilin, SHENG Ziqi, et al. Overview of deepfake proactive defense techniques[J]. Journal of Image and Graphics, 2024, 29(2): 318–342. doi: 10.11834/jig.230128. [7] RUIZ N, BARGAL S A, and SCLAROFF S. Disrupting deepfakes: Adversarial attacks against conditional image translation networks and facial manipulation systems[C]. 16th European Conference on Computer Vision, Glasgow, UK, 2020: 236–251. doi: 10.1007/978-3-030-66823-5_14. [8] WANG Run, HUANG Ziheng, CHEN Zhikai, et al. Anti-forgery: Towards a stealthy and robust DeepFake disruption attack via adversarial perceptual-aware perturbations[C]. Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, Vienna, Austria, 2022: 761–767. doi: 10.24963/ijcai.2022/107. [9] HUANG Hao, WANG Yongtao, CHEN Zhaoyu, et al. CMUA-watermark: A cross-model universal adversarial watermark for combating deepfakes[C]. Proceedings of the 36th AAAI Conference on Artificial Intelligence, Palo Alto, USA, 2022: 989–997. doi: 10.1609/aaai.v36i1.19982. (查阅网上资料,未找到本条文献出版地信息,请确认). [10] XIAO Chaowei, LI Bo, ZHU Junyan, et al. Generating adversarial examples with adversarial networks[C]. Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018: 3905–3911. doi: 10.24963/ijcai.2018/543. [11] TANG Long, YE Dengpan, LU Zhenhao, et al. Feature extraction matters more: An effective and efficient universal deepfake disruptor[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2025, 21(2): 46. doi: 10.1145/3653457. [12] 王金伟, 曾可慧, 张家伟, 等. 基于空频联合卷积神经网络的GAN生成人脸检测[J]. 计算机科学, 2023, 50(6): 216–224. doi: 10.11896/jsjkx.220400268.WANG Jinwei, ZENG Kehui, ZHANG Jiawei, et al. GAN-generated face detection based on space-frequency convolutional neural network[J]. Computer Science, 2023, 50(6): 216–224. doi: 10.11896/jsjkx.220400268. [13] HE Ziwen, WANG Wei, GUAN Weinan, et al. Defeating deepfakes via adversarial visual reconstruction[C]. Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 2022: 2464–2472. doi: 10.1145/3503161.3547923. [14] 洪钰婷, 陈北京. 抵抗第二次人脸属性编辑的不可感知主动防御算法[J]. 计算机辅助设计与图形学学报, 2025: 1–10. doi: 10.3724/SP.J.1089.2024-00316.HONG Yuting and CHEN Beijing. Imperceptible proactive defense against second facial attribute editing[J]. Journal of Computer-Aided Design & Computer Graphics, 2025: 1–10. doi: 10.3724/SP.J.1089.2024-00316. [15] FENG Yixiang and HUANG Fangjun. Compression-resistant adversarial perturbation for real-world proactive defense against deepfakes[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2026, 36(4): 5161–5172. doi: 10.1109/TCSVT.2025.3626505. [16] QU Zuomin, XI Zuping, LU Wei, et al. DF-RAP: A robust adversarial perturbation for defending against deepfakes in real-world social network scenarios[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 3943–3957. doi: 10.1109/TIFS.2024.3372803. [17] BYUN J, GO H, and KIM C. Geometrically adaptive dictionary attack on face recognition[C]. Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2022: 3809–3818. doi: 10.1109/WACV51458.2022.00386. [18] LI Qilei, GAO Mingliang, ZHANG Guisheng, et al. Defending deepfakes by saliency-aware attack[J]. IEEE Transactions on Computational Social Systems, 2024, 11(4): 5060–5067. doi: 10.1109/TCSS.2023.3271121. [19] YANG Yong, LI Changjiang, JIANG Yi, et al. Invisible-face: Rethinking facial attribute privacy in social media photo sharing[J]. IEEE Transactions on Information Forensics and Security, 2025, 20: 6101–6116. doi: 10.1109/TIFS.2025.3579592. [20] 吴涛, 纪琼辉, 先兴平, 等. 信息熵驱动的图神经网络黑盒迁移对抗攻击方法[J]. 电子与信息学报, 2025, 47(10): 3814–3825. doi: 10.11999/JEIT250303.WU Tao, JI Qionghui, XIAN Xingping, et al. Information entropy-driven black-box transferable adversarial attack method for graph neural networks[J]. Journal of Electronics & Information Technology, 2025, 47(10): 3814–3825. doi: 10.11999/JEIT250303. [21] TOKOZUME Y, USHIKU Y, and HARADA T. Between-class learning for image classification[C]. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 5486–5494. doi: 10.1109/CVPR.2018.00575. [22] HENDRYCKS D, ZOU A, MAZEIKA M, et al. PixMix: Dreamlike pictures comprehensively improve safety measures[C]. Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 16783–16792. doi: 10.1109/CVPR52688.2022.01628. [23] 钱亚冠, 孔亚鑫, 陈科成, 等. 利用频谱衰减增强深度神经网络对抗迁移攻击[J]. 电子与信息学报, 2025, 47(10): 3847–3857. doi: 10.11999/JEIT250157.QIAN Yaguan, KONG Yaxin, CHEN Kecheng, et al. Adversarial transferability attack on deep neural networks through spectral coefficient decay[J]. Journal of Electronics & Information Technology, 2025, 47(10): 3847–3857. doi: 10.11999/JEIT250157. [24] 凌海, 凌捷. 基于特征融合的对抗样本定向目标攻击可迁移性增强[J]. 计算机工程, 2025, 51(11): 162–170. doi: 10.19678/j.issn.1000-3428.0069983.LING Hai and LING Jie. Transferability enhancement of adversarial sample directed targeted attack based on feature fusion[J]. Computer Engineering, 2025, 51(11): 162–170. doi: 10.19678/j.issn.1000–3428.0069983. doi: 10.19678/j.issn.1000-3428.0069983. [25] YU Hu, ZHENG Naishan, ZHOU Man, et al. Frequency and spatial dual guidance for image dehazing[C]. 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 181–198. doi: 10.1007/978-3-031-19800-7_11. [26] 沈瑜, 白珊, 魏子易, 等. 基于跨模态感知和空频交叉的医学图像融合[J]. 中国激光, 2025, 52(9): 0907106. doi: 10.3788/CJL241333.SHEN Yu, BAI Shan, WEI Ziyi, et al. Medical image fusion network for cross-modality perception and spatial-frequency interaction[J]. Chinese Journal of Lasers, 2025, 52(9): 0907106. doi: 10.3788/CJL241333. [27] TORBUNOV D, HUANG Yi, YU Haiwan, et al. UVCGAN: UNet vision transformer cycle-consistent GAN for unpaired image-to-image translation[C]. Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2023: 702–712. doi: 10.1109/WACV56688.2023.00077. [28] XIONG Zihao, ZHOU Fei, WU Fengyi, et al. DRPCA-Net: Make robust PCA great again for infrared small target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5005516. doi: 10.1109/TGRS.2025.3588392. [29] ZHU Hao, WU W, ZHU Wentao, et al. CelebV-HQ: A large-scale video facial attributes dataset[C]. Proceedings of the 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 650–667. doi: 10.1007/978-3-031-20071-7_38. [30] YU Changqian, GAO Changxin, WANG Jingbo, et al. BiSeNet V2: Bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129(11): 3051–3068. doi: 10.1007/s11263-021-01515-2. [31] SIDDIQUEE M M R, ZHOU Zongwei, TAJBAKHSH N, et al. Learning fixed points in generative adversarial networks: From image-to-image translation to disease detection and localization[C]. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 2019: 191–200. doi: 10.1109/ICCV.2019.00028. [32] HE Zhenliang, ZUO Wangmeng, KAN Meina, et al. AttGAN: Facial attribute editing by only changing what you want[J]. IEEE Transactions on Image Processing, 2019, 28(11): 5464–5478. doi: 10.1109/TIP.2019.2916751. [33] LI Xinyang, ZHANG Shengchuan, HU Jie, et al. Image-to-image translation via hierarchical style disentanglement[C]. Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 8635–8644. doi: 10.1109/CVPR46437.2021.00853. [34] CHOI Y, CHOI M, KIM M, et al. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation[C]. Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8789–8797. doi: 10.1109/CVPR.2018.00916. [35] TANG Hao, LIU Hong, XU Dan, et al. AttentionGAN: Unpaired image-to-image translation using attention-guided generative adversarial networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(4): 1972–1987. doi: 10.1109/TNNLS.2021.3105725. [36] KARRAS T, LAINE S, and AILA T. A style-based generator architecture for generative adversarial networks[C]. Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 4396–4405. doi: 10.1109/CVPR.2019.00453. [37] RÖSSLER A, COZZOLINO D, VERDOLIVA L, et al. FaceForensics++: Learning to detect manipulated facial images[C]. Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 1–11. doi: 10.1109/ICCV.2019.00009. -
下载:
下载: