Phase Shift-Based Covert Backdoor Attack Strategy in Deep Neural Networks
-
摘要: 后门攻击严重威胁深度神经网络(DNN)的安全。植入后门后,模型在遇到带有特定触发器的输入时会输出预设错误,而对干净样本仍保持正常性能。现有研究已在空间域与频域触发器设计方面展开探索,但多数方法为确保攻击成功率(ASR),而牺牲了触发器的不可感知性。该文提出一种基于相位偏移的频域后门攻击(FDPS)方法。该方法通过离散傅里叶变换(DFT)将图像映射至频域,并在选定的频率分量上施加相位扰动以嵌入触发器。具体而言,FDPS优先针对中高频相位分量进行精细调控,以最小化幅度谱变化并避免引入可察觉的伪影。鉴于相位信息主导正弦波的相对位移,此类扰动可自然协调视觉语义,从而显著提升隐蔽性。相较于传统幅度扰动策略,相位偏移在保留图像全局结构的同时,更有效地规避了基于图像的防御检测机制。实验表明,与BadNets、Blend、WaNet及Ftrojan等基准后门攻击相比,FDPS在攻击成功率、干净样本准确率以及结构相似性指数(SSIM)等指标上均表现优越。此外,在GTSRB数据集上,仅需毒化2%的训练样本即可实现99%的攻击成功率,显著降低了攻击的样本需求与技术门槛,展现出对不同攻击场景更强的鲁棒性与适应性。Abstract:
Objective The proliferation of Deep Neural Networks (DNNs) in safety-critical domains such as autonomous driving and biomedical diagnostics has raised serious concerns about their vulnerability to adversarial threats, particularly backdoor attacks. In these attacks, hidden triggers are embedded during training, causing models to behave normally on clean inputs while producing malicious outputs when specific triggers are present. Existing backdoor methods mainly operate in either the spatial domain or the frequency domain, but they face a fundamental tradeoff between Attack Success Rate (ASR) and stealth. Spatial triggers often introduce visible artifacts, whereas frequency-domain amplitude perturbations disrupt spectral energy distributions and can therefore be detected by advanced defenses such as spectral anomaly detection. This study addresses the need for a backdoor paradigm that simultaneously achieves high attack performance, minimal perceptual distortion, and robustness against state-of-the-art defense methods. The objective is to develop a frequency-domain backdoor attack based on phase manipulation, which is better aligned with human visual perception and structural consistency, thereby overcoming the limitations of existing methods. Methods FDPS integrates frequency-domain phase manipulation, perceptual similarity screening, and standard data poisoning. The method first converts input images from RGB to Y'CbCr color space. This conversion isolates the chrominance channels while preserving the luminance component. Discrete Fourier Transform (DFT) is then applied to the chrominance components to obtain complex frequency spectra. Phase information is computed with the atan2 function, and selected high-frequency components are shifted to embed the trigger. Image reconstruction is performed through Inverse Discrete Fourier Transform (IDFT). The framework further incorporates Learned Perceptual Image Patch Similarity (LPIPS) filtering. This filter removes generated samples that do not satisfy the similarity threshold. The screening process ensures that all retained triggers remain visually imperceptible. The accepted poisoned samples are assigned the target class labels and then combined with the clean training data according to standard protocols. Results and Discussions FDPS achieves near-perfect ASR, reaching 99%, while maintaining Benign Accuracy (BA) across three datasets and two network architectures ( Table 1 ). The method embeds triggers by manipulating phase information in the Cb and Cr chrominance channels through Fourier transforms, and LPIPS filtering helps preserve visual stealth. Experimental results show that poisoned images retain semantic focus, as confirmed by Grad-CAM visualizations that remain aligned with the clean-image patterns (Fig. 4 ). The method also shows strong resistance to defense mechanisms. Under Neural Cleanse, FDPS yields an anomaly index of 1.73, which is below the detection threshold of 2 (Figs. 3 -5 ). Under STRIP, the entropy distribution of poisoned samples substantially overlaps with that of clean samples. Additional analysis shows that high-frequency phase perturbation achieves strong attack performance with limited poisoning. In particular, on the GTSRB dataset, FDPS achieves 99% ASR with only 2% poisoned training samples, while minimizing the effect on model utility (Fig. 6 ;Table 3 ).Conclusions An end-to-end frequency-domain strategy is proposed to embed covert triggers into image classification models while preserving fidelity on clean samples. By shifting selected high-frequency phase components in the chrominance channels and applying LPIPS-based filtering, FDPS achieves 99% ASR with negligible BA loss and minimal visible artifacts. It also evades representative detection methods, including Grad-CAM, Neural Cleanse, Adversarial Neuron Pruning (ANP), and STRIP. These findings indicate that high-frequency phase perturbation constitutes an effective and stealthy backdoor mechanism. Future work should extend this strategy to broader modalities and develop dedicated frequency-domain anomaly detectors as principled countermeasures. -
Key words:
- Backdoor attacks /
- Deep neural networks /
- Image classification /
- Frequency domain
-
表 1 不同基线方法在3个数据集上的攻击成功率(ASR)和良性样本准确率(BA)对比
模型 方法 CIFAR10 GTSRB CINIC10 BA(%) ASR(%) BA(%) ASR(%) BA(%) ASR(%) ResNet18 Clean 95.00 - 99.20 - 88.02 - BadNets 94.02 99.42 99.13 99.52 85.77 99.26 Blend 94.26 99.87 99.02 99.77 85.82 99.89 WaNet 94.22 99.74 99.10 99.73 84.87 89.61 Ftrojan 94.10 99.26 99.01 98.90 85.90 99.70 DUBA 94.55 99.68 99.17 99.92 87.85 99.24 FDPS 94.02 99.33 99.15 99.12 86.11 99.13 RepVGG Clean 91.20 - 99.40 - 87.24 - BadNets 91.07 99.10 99.34 99.67 86.78 99.19 Blend 91.10 98.74 99.13 99.36 85.76 98.02 WaNet 91.07 97.95 99.13 99.38 86.16 88.76 Ftrojan 90.08 99.14 99.29 99.26 85.82 97.84 DUBA 91.18 99.78 99.34 99.78 87.01 99.02 FDPS 91.00 99.19 99.30 99.35 86.22 99.11 表 2 5种后门攻击方法在3种数据集上的SSIM与LPIPS对比
数据集 BadNets Blend WaNet Ftrojan DUBA FDPS SSIM LPIPS SSIM LPIPS SSIM LPIPS SSIM LPIPS SSIM LPIPS SSIM LPIPS CIFAR10 0.972 0.0093 0.864 0.1319 0.953 0.0099 0.954 0.0125 0.977 0.0081 0.972 0.0058 GTSRB 0.968 0.0347 0.879 0.1569 0.950 0.0528 0.912 0.0598 0.972 0.0201 0.975 0.0112 CINIC10 0.978 0.0065 0.887 0.1832 0.941 0.0951 0.971 0.0754 0.968 0.0072 0.972 0.0052 表 3 不同频率位置下良性样本准确率(BA)与攻击成功率(ASR)的比较(%)
频率 CIFAR10 GTSSRB CINIC10 BA ASR BA ASR BA ASR ($ u=\dfrac{1}{2}M,v=\dfrac{1}{2}N $) 94.02 99.33 99.15 99.12 86.11 99.13 ($ u=\dfrac{3}{4}M,v=\dfrac{3}{4}N $) 93.55 98.37 99.10 98.75 85.97 97.94 ($ u=M,v=N $) 93.11 94.18 94.18 95.47 85.49 87.55 -
[1] CHENG Jian, LONG Kaifang, ZHANG Shuang, et al. Text-image scene graph fusion for multimodal named entity recognition[J]. IEEE Transactions on Artificial Intelligence, 2024, 5(6): 2828–2839. doi: 10.1109/TAI.2023.3326416. [2] 任海玉, 刘建平, 王健, 等. 基于大语言模型的智能问答系统研究综述[J]. 计算机工程与应用, 2025, 61(7): 1–24. doi: 10.3778/j.issn.1002-8331.2409-0300.REN Haiyu, LIU Jianping, WANG Jian, et al. Research on intelligent question answering system based on large language model[J]. Computer Engineering and Applications, 2025, 61(7): 1–24. doi: 10.3778/j.issn.1002-8331.2409-0300. [3] NOWROOZI E, JADALLA N, GHELICHKHANI S, et al. Mitigating label flipping attacks in malicious URL detectors using ensemble trees[J]. IEEE Transactions on Network and Service Management, 2024, 21(6): 6875–6884. doi: 10.1109/TNSM.2024.3447411. [4] 任利强, 贾舒宜, 王海鹏, 等. 基于深度学习的时间序列分类研究综述[J]. 电子与信息学报, 2024, 46(8): 3094–3116. doi: 10.11999/JEIT231222.REN Liqiang, JIA Shuyi, WANG Haipeng, et al. A review of research on time series classification based on deep learning[J]. Journal of Electronics & Information Technology, 2024, 46(8): 3094–3116. doi: 10.11999/JEIT231222. [5] 汪旭童, 尹捷, 刘潮歌, 等. 神经网络后门攻击与防御综述[J]. 计算机学报, 2024, 47(8): 1713–1743. doi: 10.11897/SP.J.1016.2024.01713.WANG Xutong, YIN Jie, LIU Chaoge, et al. A survey of backdoor attacks and defenses on neural networks[J]. Chinese Journal of Computers, 2024, 47(8): 1713–1743. doi: 10.11897/SP.J.1016.2024.01713. [6] FENG Jun, LAI Yuzhe, SUN Hong, et al. SADBA: Self-adaptive distributed backdoor attack against federated learning[C]. The 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 16568–16576. doi: 10.1609/aaai.v39i16.33820. [7] FENG Jun, YANG L T, ZHU Qing, et al. Privacy-preserving tensor decomposition over encrypted data in a federated cloud environment[J]. IEEE Transactions on Dependable and Secure Computing, 2020, 17(4): 857–868. doi: 10.1109/TDSC.2018.2881452. [8] ZHANG Pengfei, SUN Hong, ZHANG Zhikun, et al. Privacy-preserving recommendations with mixture model-based matrix factorization under local differential privacy[J]. IEEE Transactions on Industrial Informatics, 2025, 21(7): 5451–5459. doi: 10.1109/tii.2025.3555993. [9] 杜巍, 刘功申. 深度学习中的后门攻击综述[J]. 信息安全学报, 2022, 7(3): 1–16. doi: 10.19363/J.cnki.cn10-1380/tn.2022.05.01.DU Wei and LIU Gongshen. A survey of backdoor attack in deep learning[J]. Journal of Cyber Security, 2022, 7(3): 1–16. doi: 10.19363/J.cnki.cn10-1380/tn.2022.05.01. [10] GU Tianyu, LIU Kang, DOLAN-GAVITT B, et al. BadNets: Evaluating backdooring attacks on deep neural networks[J]. IEEE Access, 2019, 7: 47230–47244. doi: 10.1109/ACCESS.2019.2909068. [11] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[C]. The IEEE International Conference on Computer Vision, Venice, Italy, 2017: 618–626. doi: 10.1109/ICCV.2017.74. [12] NGUYEN T A and TRAN A T. WaNet-imperceptible warping-based backdoor attack[C]. The 9th International Conference on Learning Representations, Austria, 2021. [13] LIN Junyu, XU Lei, LIU Yingqi, et al. Composite backdoor attack for deep neural network by mixing existing benign features[C]. The 2020 ACM SIGSAC Conference on Computer and Communications Security, USA, 2020: 113–131. doi: 10.1145/3372297.3423362. [14] WANG Tong, YAO Yuan, XU Feng, et al. An invisible black-box backdoor attack through frequency domain[C]. The 17th European Conference on Computer Vision, Tel Aviv, Israel, 2022: 396–413. doi: 10.1007/978-3-031-19778-9_23. [15] FENG Yu, MA Benteng, ZHANG Jing, et al. FIBA: Frequency-injection based backdoor attack in medical image analysis[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, USA, 2022: 20844–20853. doi: 10.1109/CVPR52688.2022.02021. [16] GAO Yansong, XU Change, WANG Derui, et al. STRIP: A defence against trojan attacks on deep neural networks[C]. The 35th Annual Computer Security Applications Conference, San Juan, USA, 2019: 113–125. doi: 10.1145/3359789.3359790. [17] XU Honghui, FANG Chuangjie, WANG Renfang, et al. Dual-enhanced high-order self-learning tensor singular value decomposition for robust principal component analysis[J]. IEEE Transactions on Artificial Intelligence, 2024, 5(7): 3564–3578. doi: 10.1109/TAI.2024.3373388. [18] YAN Qingsen, FENG Yixu, ZHANG Cheng, et al. HVI: A new color space for low-light image enhancement[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2025: 5678–5687. doi: 10.1109/CVPR52734.2025.00533. [19] HE Kaiming, ZHANG Xiangyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. The IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90. [20] DING Xiaohan, ZHANG Xiangyu, MA Ningning, et al. RepVGG: Making VGG-style ConvNets great again[C]. The IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, USA, 2021: 13728–13737. doi: 10.1109/CVPR46437.2021.01352. [21] KRIZHEVSKY A. Learning multiple layers of features from tiny images[R]. Technical Report TR-2009, 2009. [22] STALLKAMP J, SCHLIPSING M, SALMEN J, et al. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition[J]. Neural Networks, 2012, 32: 323–332. doi: 10.1016/j.neunet.2012.02.016. [23] DARLOW L N, CROWLEY E J, ANTONIOU A, et al. CINIC-10 is not ImageNet or CIFAR-10[J]. arXiv preprint arXiv: 1810.03505, 2018. doi: 10.48550/arXiv.1810.03505. [24] GAO Yudong, CHEN Honglong, SUN Peng, et al. A dual stealthy backdoor: From both spatial and frequency perspectives[C]. The 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 1851–1859. doi: 10.1609/aaai.v38i3.27954. [25] WU Dongxian and WANG Yisen. Adversarial neuron pruning purifies backdoored deep models[C]. The 35th International Conference on Neural Information Processing Systems, 2021: 1293. [26] WANG Bolun, YAO Yuanshun, SHAN S, et al. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks[C]. The 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, USA, 2019: 707–723. doi: 10.1109/SP.2019.00031. -
下载:
下载: