面向人脸伪造防御的属性感知对抗样本生成方法

高帆; 严伟丹; 邵文泽; 张登银

doi:10.11999/JEIT260043

面向人脸伪造防御的属性感知对抗样本生成方法

doi: 10.11999/JEIT260043 cstr: 32379.14.JEIT260043

1.
南京邮电大学通信与信息工程学院南京 210003
2.
南京邮电大学物联网学院南京 210003

基金项目: 国家自然科学基金(62471241, 92470126)

详细信息

作者简介:
高帆：男，硕士生，研究方向为深度伪造防御

严伟丹：男，博士生，研究方向为图像处理、深度伪造防御等

邵文泽：男，博士，教授，研究方向为变分方法、计算统计、表示学习及其成像与视觉应用

张登银：男，博士，研究员，研究方向为现代通信网络、信号与信息处理等

通讯作者:
张登银　zhangdy@njupt.edu.cn

中图分类号: TN911.73; TP391
计量
- 文章访问数: 260
- HTML全文浏览量: 160
- PDF下载量: 32
- 被引次数: 0
出版历程
- 收稿日期: 2026-01-12
- 修回日期: 2026-05-29
- 录用日期: 2026-05-29
- 网络出版日期: 2026-06-08

Defending against Deepfakes by Attribute-Aware Attack

1.
School of Communications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2.
School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210003, China

Funds: The National Natural Science Foundation of China (62471241, 92470126)

摘要

摘要: 人脸深度伪造的非法滥用，会导致严重的人身财产损害。人脸伪造防御现阶段存在对抗样本的跨模型防御可迁移性不足、隐蔽性与攻击有效性难以平衡的问题，该文提出一种属性感知的对抗样本生成方法。为了达到改善对抗扰动隐蔽性的同时，兼顾跨模型防御性能的目的，一方面，该文在仅考虑人脸图像前景的基础上，使用属性显著掩码划分出的面部与发型区域，通过自适应扰动生成器生成特异性的对抗扰动，有效平衡了对抗样本的隐蔽性与攻击性。另一方面，该文从数据增强角度出发，通过融合参考人脸图像的频域相位信息，生成更具多样性的输入特征，防范扰动过拟合的同时提升可迁移性能。实验结果表明，所提方法在定量和定性测试中均取得较好的防御性能，适用于跨模型防御的应用场景。
- 深度伪造防御 /
- 对抗样本 /
- 生成对抗网络 /
- 属性攻击
Abstract: Objective Deepfakes can cause serious personal and property damage when misused. To prevent forged images from spreading, existing methods often use adversarial examples to protect facial images from deepfake manipulation. However, traditional gradient-based attacks show limited generalization and low generation efficiency in black-box attack scenarios. Their performance is also weaker than that of current methods based on Generative Adversarial Networks (GANs), which are used to train cross-model adversarial examples. Although GAN-based methods support fast inference, their lack of perceptual constraints often makes the generated adversarial perturbations visually noticeable. The rapid development of deepfake models also raises higher requirements for the generalization ability of adversarial examples. Therefore, imperceptible and generalizable adversarial attack methods are needed for proactive deepfake defense. Methods To further improve the transferability and imperceptibility of adversarial examples generated by existing methods, this paper proposes an attribute-aware adversarial example generation method for deepfake defense. The proposed method generates imperceptible perturbations and improves cross-model generalization through a frequency-domain identity fusion mechanism. Specifically, it focuses on the foreground regions of facial images, uses attribute-aware salient segmentation masks to separate facial and hairstyle regions, and combines these masks with adaptive spatial-frequency attention-based perturbation generators to generate region-specific adversarial perturbations. This strategy improves the imperceptibility of adversarial examples and reduces the additional computational cost caused by global processing. From the perspective of data augmentation, this paper further uses phase swapping in the frequency domain to fuse identity-related features from reference face images. This design reduces perturbation overfitting and improves generalization performance. Results and Discussions The proposed method is trained and tested on the CelebA-HQ dataset using proxy models. Compared with existing proactive defense methods, the experimental results show that the proposed method generates adversarial examples with strong imperceptibility and cross-model defense capability. It achieves a high defense success rate against various proxy models. The average Peak Signal-to-Noise Ratio (PSNR) of forged outputs under adversarial perturbations is reduced to 16.79 dB, representing an improvement of approximately 1.87% over the second-best method. Defense performance against HiSD is improved by approximately 7.5% compared with the second-best method. Defense performance against AttGAN is approximately 12.7% higher than that of the second-best GAN-based defense method. Moreover, the Learned Perceptual Image Patch Similarity (LPIPS) metric shows that the adversarial perturbations have high imperceptibility. Conclusions This study proposes a facial attribute-aware attack method for deepfake defense. The method incorporates a frequency-domain identity fusion mechanism to increase the diversity of adversarial feature inputs. Adaptive spatial-frequency attention-based perturbation generators are also designed to extract local facial information and dynamically adjust adversarial features. These designs allow the method to preserve perturbation components that are both imperceptible and attack-effective, leading to strong cross-model generalization. Future work will focus on proactive deepfake defense methods with improved imperceptibility and generalization, especially in cross-model transfer attack scenarios.
- Deepfakes defense /
- Adversarial examples /
- Generative Adversarial Networks(GAN) /
- Attribute-aware attack

HTML全文

图 1 人脸先验知识可视化对比

下载: 全尺寸图片幻灯片

图 2 本文方法的总体架构

下载: 全尺寸图片幻灯片

图 3 自适应对抗扰动生成器网络结构和空频注意力模块

下载: 全尺寸图片幻灯片

图 4 不同方法的主动防御结果可视化对比示例

下载: 全尺寸图片幻灯片

图 5 本文方法的跨模型防御结果可视化示例

下载: 全尺寸图片幻灯片

图 6 对抗样本生成方法在不同测试集上迁移攻击的防御结果可视化对比

下载: 全尺寸图片幻灯片

图 7 显著分割掩码和属性分割掩码引导的主动防御结果可视化对比

下载: 全尺寸图片幻灯片

图 8 本文方法的跨模型防御消融实验结果可视化

下载: 全尺寸图片幻灯片

表 1 在CelebA-HQ测试集上的主动防御性能的定量结果对比

对抗样本		隐蔽性能		主动防御性能							推理时间 (s)
		隐蔽性能		攻击成功率(%)					扰动输出
		PSNR↑	LPIPS↓	FGAN	AttGAN	HiSD	StarGAN	AGAN	PSNR↓	LPIPS↑
非集成	LAE^[13,14]	17.10	0.2551	78.0	81.0	85.0	100.0	91.0	16.72	0.2700	3.020
	D-D^[7]	35.40	0.0525	99.8	0.0	0.0	100.0	98.2	24.84	0.2163	1.040
	D-D(SA)^[7,18]	37.86	0.0188	98.4	0.0	0.0	100.0	97.0	30.74	0.2072	1.220
	A-F^[8]	40.86	0.0036	100.0	0.0	0.0	60.0	100.0	26.15	0.1909	9.130
跨模型集成	SA^[14,18]	32.11	0.0814	98.0	27.0	85.0	93.0	99.0	18.37	0.2272	5.470
	CMUA^[9,14]	32.45	0.1678	87.0	97.0	23.0	91.0	99.0	20.53	0.3306	-
	FOUND^[11,14]	33.23	0.1441	73.0	91.0	44.0	100.0	100.0	17.38	0.3688	-
	AdvGAN^[10]	30.35	0.0692	96.2	0.0	80.0	100.0	98.0	17.11	0.3022	0.116
	PD-DWT^[14]	32.91	0.0353	95.0	74.0	73.0	100.0	99.0	17.75	0.2871	0.124
	本文	32.62	0.0435	97.4	83.4	91.4	100.0	100.0	16.79	0.2849	0.167
注：粗体表示最优。

下载: 导出CSV

表 2 对抗样本生成方法在不同来源测试集上迁移攻击的防御性能定量结果对比

数据集	对抗样本	隐蔽性能		主动防御性能
		隐蔽性能		攻击成功率(%)					扰动输出
		PSNR↑	SSIM↑	FGAN	AttGAN	HiSD	StarGAN	AGAN	PSNR↓	SSIM↓
FFHQ	AdvGAN^[10]	30.318	0.7835	100.0	0.0	77.2	100.0	100.0	18.676	0.6495
	PD-DWT^[14]	32.570	0.8637	99.0	87.6	78.2	99.6	100.0	17.142	0.6529
	本文	32.352	0.8578	100.0	84.8	92.4	100.0	100.0	16.527	0.6241
LFW	AdvGAN^[10]	30.405	0.7698	99.6	0.0	79.2	100.0	100.0	17.536	0.6081
	PD-DWT^[14]	33.591	0.8666	99.6	89.6	78.0	99.6	100.0	18.527	0.7079
	本文	33.441	0.8923	100.0	84.0	87.6	99.2	100.0	17.518	0.6655
FF++	AdvGAN^[10]	30.305	0.7457	100.0	0.0	93.8	99.7	100.0	16.439	0.5998
	PD-DWT^[14]	32.736	0.8663	100.0	100.0	80.9	100.0	100.0	17.710	0.6831
	本文	32.684	0.8655	100.0	87.8	85.3	100.0	100.0	16.698	0.6521

下载: 导出CSV

表 3 本文方法的跨模型集成防御消融实验结果

对抗样本	隐蔽性能		主动防御性能
	隐蔽性能		攻击成功率(%)					扰动输出
	PSNR↑	LPIPS↓	StarGAN	FGAN	AttGAN	HiSD	AGAN	PSNR↓	LPIPS↑
无①②③	25.55	0.2299	100.0	100.0	100.0	80.0	100.0	16.49	0.4741
无①②	26.83	0.1303	100.0	99.8	95.6	87.4	99.8	15.62	0.4316
无①③	26.08	0.1500	100.0	99.4	100.0	86.2	100.0	14.95	0.4471
无②③	32.31	0.0689	99.0	100.0	43.0	63.2	99.8	17.92	0.3299
无①	26.10	0.1592	100.0	97.6	100.0	89.4	100.0	14.54	0.4805
无②	32.56	0.0459	99.6	96.2	70.0	95.4	100.0	17.03	0.3204
无③	32.16	0.0471	99.2	97.6	86.2	84.6	100.0	17.70	0.3102
完整方法	32.62	0.0435	100.00	97.4	83.4	91.4	100.0	16.79	0.2849

下载: 导出CSV

参考文献(37)

[1]	PENG Chunlei, LUO Xiaoyi, LIU Decheng, et al. Semantic token transformer for face forgery detection[J]. IEEE Transactions on Information Forensics and Security, 2025, 20: 4904–4914. doi: 10.1109/TIFS.2025.3567110.
[2]	刘鹏宇, 郑添阳, 董敏. 一种伪造注意图驱动的多任务深伪视频检测模型[J]. 电子与信息学报, 2026, 48(1): 346–358. doi: 10.11999/JEIT250926. LIU Pengyu, ZHENG Tianyang, and DONG Min. A fake attention map-driven multi-task deepfake video detection model[J]. Journal of Electronics & Information Technology, 2026, 48(1): 346–358. doi: 10.11999/JEIT250926.
[3]	GOODFELLOW I J, SHLENS J, and SZEGEDY C. Explaining and harnessing adversarial examples[C]. 3rd International Conference on Learning Representations, San Diego, USA, 2015. doi: 10.48550/arXiv.1412.6572.
[4]	LIU Decheng, SU Qixuan, PENG Chunlei, et al. Imperceptible face forgery attack via adversarial semantic mask[EB/OL]. https://doi.org/10.48550/arXiv.2406.10887, 2024.
[5]	DEB D, ZHANG Jianbang, and JAIN A K. AdvFaces: Adversarial face synthesis[C]. 2020 IEEE International Joint Conference on Biometrics (IJCB), Houston, USA, 2020: 1–10. doi: 10.1109/IJCB48548.2020.9304898.
[6]	瞿左珉, 殷琪林, 盛紫琦, 等. 人脸深度伪造主动防御技术综述[J]. 中国图象图形学报, 2024, 29(2): 318–342. doi: 10.11834/jig.230128. QU Zuomin, YIN Qilin, SHENG Ziqi, et al. Overview of deepfake proactive defense techniques[J]. Journal of Image and Graphics, 2024, 29(2): 318–342. doi: 10.11834/jig.230128.
[7]	RUIZ N, BARGAL S A, and SCLAROFF S. Disrupting deepfakes: Adversarial attacks against conditional image translation networks and facial manipulation systems[C]. The 16th European Conference on Computer Vision, Glasgow, UK, 2020: 236–251. doi: 10.1007/978-3-030-66823-5_14.
[8]	WANG Run, HUANG Ziheng, CHEN Zhikai, et al. Anti-forgery: Towards a stealthy and robust DeepFake disruption attack via adversarial perceptual-aware perturbations[C]. The Thirty-First International Joint Conference on Artificial Intelligence, Vienna, Austria, 2022: 761–767. doi: 10.24963/ijcai.2022/107.
[9]	HUANG Hao, WANG Yongtao, CHEN Zhaoyu, et al. CMUA-watermark: A cross-model universal adversarial watermark for combating deepfakes[C]. The 36th AAAI Conference on Artificial Intelligence, Palo Alto, USA, 2022: 989–997. doi: 10.1609/aaai.v36i1.19982.
[10]	XIAO Chaowei, LI Bo, ZHU Junyan, et al. Generating adversarial examples with adversarial networks[C]. The 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018: 3905–3911. doi: 10.24963/ijcai.2018/543.
[11]	TANG Long, YE Dengpan, LU Zhenhao, et al. Feature extraction matters more: An effective and efficient universal deepfake disruptor[J]. ACM Transactions on Multimedia Computing, Communications, and Applications, 2025, 21(2): 46. doi: 10.1145/3653457.
[12]	王金伟, 曾可慧, 张家伟, 等. 基于空频联合卷积神经网络的GAN生成人脸检测[J]. 计算机科学, 2023, 50(6): 216–224. doi: 10.11896/jsjkx.220400268. WANG Jinwei, ZENG Kehui, ZHANG Jiawei, et al. GAN-generated face detection based on space-frequency convolutional neural network[J]. Computer Science, 2023, 50(6): 216–224. doi: 10.11896/jsjkx.220400268.
[13]	HE Ziwen, WANG Wei, GUAN Weinan, et al. Defeating deepfakes via adversarial visual reconstruction[C]. The 30th ACM International Conference on Multimedia, Lisboa, Portugal, 2022: 2464–2472. doi: 10.1145/3503161.3547923.
[14]	洪钰婷, 陈北京. 抵抗第二次人脸属性编辑的不可感知主动防御算法[J]. 计算机辅助设计与图形学学报, 2025: 1–10. doi: 10.3724/SP.J.1089.2024-00316. HONG Yuting and CHEN Beijing. Imperceptible proactive defense against second facial attribute editing[J]. Journal of Computer-Aided Design & Computer Graphics, 2025: 1–10. doi: 10.3724/SP.J.1089.2024-00316.
[15]	FENG Yixiang and HUANG Fangjun. Compression-resistant adversarial perturbation for real-world proactive defense against deepfakes[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2026, 36(4): 5161–5172. doi: 10.1109/TCSVT.2025.3626505.
[16]	QU Zuomin, XI Zuping, LU Wei, et al. DF-RAP: A robust adversarial perturbation for defending against deepfakes in real-world social network scenarios[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 3943–3957. doi: 10.1109/TIFS.2024.3372803.
[17]	BYUN J, GO H, and KIM C. Geometrically adaptive dictionary attack on face recognition[C]. The 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2022: 3809–3818. doi: 10.1109/WACV51458.2022.00386.
[18]	LI Qilei, GAO Mingliang, ZHANG Guisheng, et al. Defending deepfakes by saliency-aware attack[J]. IEEE Transactions on Computational Social Systems, 2024, 11(4): 5060–5067. doi: 10.1109/TCSS.2023.3271121.
[19]	YANG Yong, LI Changjiang, JIANG Yi, et al. Invisible-face: Rethinking facial attribute privacy in social media photo sharing[J]. IEEE Transactions on Information Forensics and Security, 2025, 20: 6101–6116. doi: 10.1109/TIFS.2025.3579592.
[20]	吴涛, 纪琼辉, 先兴平, 等. 信息熵驱动的图神经网络黑盒迁移对抗攻击方法[J]. 电子与信息学报, 2025, 47(10): 3814–3825. doi: 10.11999/JEIT250303. WU Tao, JI Qionghui, XIAN Xingping, et al. Information entropy-driven black-box transferable adversarial attack method for graph neural networks[J]. Journal of Electronics & Information Technology, 2025, 47(10): 3814–3825. doi: 10.11999/JEIT250303.
[21]	TOKOZUME Y, USHIKU Y, and HARADA T. Between-class learning for image classification[C]. The 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 5486–5494. doi: 10.1109/CVPR.2018.00575.
[22]	HENDRYCKS D, ZOU A, MAZEIKA M, et al. PixMix: Dreamlike pictures comprehensively improve safety measures[C]. The 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 16783–16792. doi: 10.1109/CVPR52688.2022.01628.
[23]	凌海, 凌捷. 基于特征融合的对抗样本定向目标攻击可迁移性增强[J]. 计算机工程, 2025, 51(11): 162–170. doi: 10.19678/j.issn.1000-3428.0069983. LING Hai and LING Jie. Transferability enhancement of adversarial sample directed targeted attack based on feature fusion[J]. Computer Engineering, 2025, 51(11): 162–170. doi: 10.19678/j.issn.1000-3428.0069983.
[24]	钱亚冠, 孔亚鑫, 陈科成, 等. 利用频谱衰减增强深度神经网络对抗迁移攻击[J]. 电子与信息学报, 2025, 47(10): 3847–3857. doi: 10.11999/JEIT250157. QIAN Yaguan, KONG Yaxin, CHEN Kecheng, et al. Adversarial transferability attack on deep neural networks through spectral coefficient decay[J]. Journal of Electronics & Information Technology, 2025, 47(10): 3847–3857. doi: 10.11999/JEIT250157.
[25]	YU Hu, ZHENG Naishan, ZHOU Man, et al. Frequency and spatial dual guidance for image dehazing[C]. 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 181–198. doi: 10.1007/978-3-031-19800-7_11.
[26]	沈瑜, 白珊, 魏子易, 等. 基于跨模态感知和空频交叉的医学图像融合[J]. 中国激光, 2025, 52(9): 0907106. doi: 10.3788/CJL241333. SHEN Yu, BAI Shan, WEI Ziyi, et al. Medical image fusion network for cross-modality perception and spatial-frequency interaction[J]. Chinese Journal of Lasers, 2025, 52(9): 0907106. doi: 10.3788/CJL241333.
[27]	TORBUNOV D, HUANG Yi, YU Haiwan, et al. UVCGAN: UNet vision transformer cycle-consistent GAN for unpaired image-to-image translation[C]. The 2023 IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, USA, 2023: 702–712. doi: 10.1109/WACV56688.2023.00077.
[28]	XIONG Zihao, ZHOU Fei, WU Fengyi, et al. DRPCA-Net: Make robust PCA great again for infrared small target detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2025, 63: 5005516. doi: 10.1109/TGRS.2025.3588392.
[29]	ZHU Hao, WU W, ZHU Wentao, et al. CelebV-HQ: A large-scale video facial attributes dataset[C]. The 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 650–667. doi: 10.1007/978-3-031-20071-7_38.
[30]	YU Changqian, GAO Changxin, WANG Jingbo, et al. BiSeNet V2: Bilateral network with guided aggregation for real-time semantic segmentation[J]. International Journal of Computer Vision, 2021, 129(11): 3051–3068. doi: 10.1007/s11263-021-01515-2.
[31]	SIDDIQUEE M M R, ZHOU Zongwei, TAJBAKHSH N, et al. Learning fixed points in generative adversarial networks: From image-to-image translation to disease detection and localization[C]. The 2019 IEEE/CVF International Conference on Computer Vision, Seoul, South Korea, 2019: 191–200. doi: 10.1109/ICCV.2019.00028.
[32]	HE Zhenliang, ZUO Wangmeng, KAN Meina, et al. AttGAN: Facial attribute editing by only changing what you want[J]. IEEE Transactions on Image Processing, 2019, 28(11): 5464–5478. doi: 10.1109/TIP.2019.2916751.
[33]	LI Xinyang, ZHANG Shengchuan, HU Jie, et al. Image-to-image translation via hierarchical style disentanglement[C]. The 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 8635–8644. doi: 10.1109/CVPR46437.2021.00853.
[34]	CHOI Y, CHOI M, KIM M, et al. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation[C]. The 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018: 8789–8797. doi: 10.1109/CVPR.2018.00916.
[35]	TANG Hao, LIU Hong, XU Dan, et al. AttentionGAN: Unpaired image-to-image translation using attention-guided generative adversarial networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(4): 1972–1987. doi: 10.1109/TNNLS.2021.3105725.
[36]	KARRAS T, LAINE S, and AILA T. A style-based generator architecture for generative adversarial networks[C]. The 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 4396–4405. doi: 10.1109/CVPR.2019.00453.
[37]	RÖSSLER A, COZZOLINO D, VERDOLIVA L, et al. FaceForensics++: Learning to detect manipulated facial images[C]. The 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Korea (South), 2019: 1–11. doi: 10.1109/ICCV.2019.00009.