Advanced Search
Turn off MathJax
Article Contents
LI Zilong, YANG Gaoming, HAN Dongyu, FANG Xianjin. Cross-Domain Deepfake Detection with Dynamic Artifacts Tracking and Spatial-Frequency Interaction Analysis[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251290
Citation: LI Zilong, YANG Gaoming, HAN Dongyu, FANG Xianjin. Cross-Domain Deepfake Detection with Dynamic Artifacts Tracking and Spatial-Frequency Interaction Analysis[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251290

Cross-Domain Deepfake Detection with Dynamic Artifacts Tracking and Spatial-Frequency Interaction Analysis

doi: 10.11999/JEIT251290 cstr: 32379.14.JEIT251290
Funds:  The National Natural Science Foundation of China (52374155), Anhui Provincial Natural Science Foundation (2308085MF218), Natural Science Research Project of Colleges and Universities in Anhui Province (2022AH040113), Medical Special Cultivation Project of Anhui University of Science and Technology (YZ2023H2B007)
  • Accepted Date: 2026-05-12
  • Rev Recd Date: 2026-05-12
  • Available Online: 2026-06-05
  •   Objective  The rapid development of generative adversarial networks and diffusion models has led to a sharp increase in the number of fake images. The widespread dissemination of fake images poses a potential and unpredictable threat to individuals, societies, and nations. Developing efficient and highly generalizable deepfake detection methods is needed. In current forgery detection research, cross-domain detection capability has become a core task in deepfake detection. However, existing detection methods still suffer from problems such as feature extraction relying on specific artifacts or fixed parameters, spatial-frequency modalities often being learned in isolation and lacking dynamic interaction mechanisms, and insufficient global feature association capabilities. To address these limitations, a Pyramidal Interactive Dual-Stream Network (PIDSNet) integrating dynamic artifact tracking and spatial-frequency interaction analysis has been proposed.  Methods  The PIDSNet is centered on two branches in the spatial and frequency domains (Fig. 1), with four modules working collaboratively: Multi-Branch Feature Extraction (MBFE) module, Frequency Domain Feature Extraction (FDFE) module, Pyramid Spatial-Frequency Interaction (PSFI) module, and Multi-Head Pyramid Squeezing Attention (MHPSA) module. The MBFE module (Fig. 2), as the basic unit of the spatial branch, avoids information loss as the receptive field increases by constructing multi-level, multi-branch dilated convolutions, achieving collaborative extraction of global and local features. The FDFE module, the core module of the frequency branch, fuses the MBFE module with spectral convolutions to achieve dynamic mining of frequency domain artifact features, reducing the dependence of traditional frequency domain methods on fixed parameters and frequency bands, significantly improving the model's adaptive capture ability of artifact features from different generative models. The PSFI module is key to the spatial-frequency branch interaction (Fig. 3), capturing low-frequency global information and high-frequency detailed features by constructing a spatial Gaussian pyramid and a frequency Laplacian pyramid. Dynamic weight enhancement at each level of the pyramid achieves adaptive fusion of spatial-frequency features, constructing a dynamic spatial-frequency feature interaction mechanism. The MHPSA module combines multi-head self-attention (MHSA) with dilated convolution (Fig. 4). While inheriting the local detail capture capability of the Pyramid Squeeze Attention (PSA) module, it also enhances the global feature modeling capability, thereby improving the model's adaptability and robustness.  Results and Discussions  To comprehensively verify the cross-domain detection capabilities of PIDSNet across different generative paradigms, this paper trains it on the ProGAN dataset and tests it on multiple GAN and diffusion model datasets. First, for the GAN generative model, in the ForenSynths test set containing four GANs (Table 3), the average Acc. reaches 95.2%, an improvement of 5.3% and 5.2% compared to LGrad and FreqNet. In the GANGen dataset containing nine GANs (Table 4, 5), the average Acc. reaches 95.5%, an improvement of 20.1% compared to F3Net, and improvements of 4.1% and 1.3% in average Acc. and average A.P. compared to FreqNet. Second, for the diffusion model, tests were conducted on the DiffusionForensics and Ojha datasets. In the DiffusionForensics dataset (Table 6), the average Acc. reaches 95.4%, an improvement of 4.8% and 13.2% compared to LGrad and FreqNet. In the Ojha dataset (Table 7), the average Acc. and average A.P. reached 96.1% and 99.4%, showing a significant improvement. More importantly, PIDSNet has only 2.4M parameters (Table 8), and achieves average Acc. and average A.P. of 95.7% and 98.7% across 25 datasets, surpassing other methods. The above experiments show that PIDSNet, trained only on the ProGAN dataset, can adapt to multiple types of GAN models and effectively detect diffusion model images with significant differences in artifact features between the spatial and frequency domains, demonstrating excellent cross-model and cross-generative paradigm generalization capabilities. Moreover, Grad-CAM visualizations reveal that despite not being trained on face images (Fig. 5), PIDSNet demonstrates strong detection performance on face images.  Conclusions  This paper addresses the problems of current GAN and diffusion model detection methods, such as feature extraction relying on domain-specific artifacts or fixed parameters and weak modal interaction, which lead to weak domain adaptability and poor generalization performance. To solve these problems, a spatial-frequency collaborative learning framework and a dynamic artifact mining mechanism are constructed to reduce the limitations of traditional methods that rely on specific domain artifacts and fixed parameters, enhancing the extraction capability of general forgery features and reducing dependence on specific artifacts. The model's effectiveness is validated on image datasets generated by 25 different GAN and diffusion models. Compared with current state-of-the-art models, the average Acc. and A.P. are significantly improved, confirming good performance in cross-domain forgery detection tasks. However, experiments reveal that PIDSNet still has certain limitations. When dealing with specific models whose high-frequency energy distribution is very close to that of real images (such as S3GAN), there is still room for performance improvement, and the frequency domain feature mining mechanism needs optimization. Therefore, future work will focus on two main aspects: firstly, continuing to optimize the frequency domain feature extraction mechanism to enhance the ability to identify forged samples with high-frequency energy features close to real images; secondly, focusing on improving the detection capability of low-quality forged images with compression distortion and noise interference, while studying artifact separation and detection methods for forged images generated by multiple models to enhance the adaptability of the model in real complex environments.
  • loading
  • [1]
    LIU Zhian, LI Maomao, ZHANG Yong, et al. Fine-grained face swapping via regional GAN inversion[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023: 8578–8587. doi: 10.1109/CVPR52729.2023.00829.
    [2]
    ZHAO Wenliang, RAO Yongming, SHI Weikang, et al. DiffSwap: High-fidelity and controllable face swapping via 3D-aware masked diffusion[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023: 8568–8577. doi: 10.1109/CVPR52729.2023.00828.
    [3]
    BALIAH S, LIN Qinliang, LIAO Shengcai, et al. Realistic and efficient face swapping: A unified approach with diffusion models[C]. 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, USA, 2025: 1062–1071. doi: 10.1109/WACV61041.2025.00112.
    [4]
    YUAN Shuaiwei, DONG Junyu, and LI Yuezun. Where the devil hides: Deepfake detectors can no longer be trusted[C]. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2025: 8764–8774. doi: 10.1109/CVPR52734.2025.00819.
    [5]
    HUANG Zhenglin, HU Jinwei, LI Xiangtai, et al. SIDA: Social media image deepfake detection, localization and explanation with large multimodal model[C]. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2025: 28831–28841. doi: 10.1109/CVPR52734.2025.02685.
    [6]
    丁峰, 匡仁盛, 周越, 等. 深度伪造及其取证技术综述[J]. 中国图象图形学报, 2024, 29(2): 295–317. doi: 10.11834/jig.230088.

    DING Feng, KUANG Rensheng, ZHOU Yue, et al. A survey of deepfake and related digital forensics[J]. Journal of Image and Graphics, 2024, 29(2): 295–317. doi: 10.11834/jig.230088.
    [7]
    CAO Junyi, MA Chao, YAO Taiping, et al. End-to-end reconstruction-classification learning for face forgery detection[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 4113–4122. doi: 10.1109/CVPR52688.2022.00408.
    [8]
    SHIOHARA K and YAMASAKI T. Detecting deepfakes with self-blended images[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 18720–18729. doi: 10.1109/CVPR52688.2022.01816.
    [9]
    张晶, 许盼, 刘文君, 等. 多样性负实例生成的跨域人脸伪造检测[J]. 中国图象图形学报, 2025, 30(2): 421–434 doi: 10.11834/jig.240160.

    ZHANG Jing, XU Pan, LIU Wenjun, et al. Negative instance generation for cross-domain facial forgery detection[J]. Journal of Image and Graphics, 2025, 30(2): 421–434. doi: 10.11834/jig.240160.
    [10]
    YAN Zhiyuan, LUO Yuhao, LYU Siwei, et al. Transcending forgery specificity with latent space augmentation for generalizable deepfake detection[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2024: 8984–8994. doi: 10.1109/CVPR52733.2024.00858.
    [11]
    OJHA U, LI Yuheng, and LEE Y J. Towards universal fake image detectors that generalize across generative models[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023: 24480–24489. doi: 10.1109/CVPR52729.2023.02345.
    [12]
    KASHIANI H, TALEMI N A, and AFGHAH F. FreqDebias: Towards generalizable deepfake detection via consistency-driven frequency debiasing[C]. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2025: 8775–8785. doi: 10.1109/CVPR52734.2025.00820.
    [13]
    TAN Chuangchuang, ZHAO Yao, WEI Shikui, et al. Learning on gradients: Generalized artifacts representation for GAN-generated images detection[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023: 12105–12114. doi: 10.1109/CVPR52729.2023.01165.
    [14]
    TAN Chuangchuang, LIU Huan, ZHAO Yao, et al. Rethinking the up-sampling operations in CNN-based generative network for generalizable deepfake detection[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2024: 28130–28139. doi: 10.1109/CVPR52733.2024.02657.
    [15]
    TAN Chuangchuang, ZHAO Yao, WEI Shikui, et al. Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning[C]. Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 5052–5060. doi: 10.1609/aaai.v38i5.28310.
    [16]
    QIAN Yuyang, YIN Guojun, SHENG Lu, et al. Thinking in frequency: Face forgery detection by mining frequency-aware clues[C]. Proceedings of the 16th European Conference on Computer Vision, Glasgow, UK, 2020: 86–103. doi: 10.1007/978-3-030-58610-2_6.
    [17]
    BINH L M and WOO S. ADD: Frequency attention and multi-view based knowledge distillation to detect low-quality compressed deepfake images[C]. Proceedings of the 36th AAAI Conference on Artificial Intelligence, Washington, USA, 2022: 122–130. doi: 10.1609/aaai.v36i1.19886.
    [18]
    WANG Bo, WU Xiaohan, WANG Fei, et al. Spatial-frequency feature fusion based deepfake detection through knowledge distillation[J]. Engineering Applications of Artificial Intelligence, 2024, 133: 108341. doi: 10.1016/j.engappai.2024.108341.
    [19]
    孙磊, 张洪蒙, 毛秀青, 等. 基于超分辨率重建的强压缩深度伪造视频检测[J]. 电子与信息学报, 2021, 43(10): 2967–2975. doi: 10.11999/JEIT200531.

    SUN Lei, ZHANG Hongmeng, MAO Xiuqing, et al. Super-resolution reconstruction detection method for deepfake hard compressed videos[J]. Journal of Electronics & Information Technology, 2021, 43(10): 2967–2975. doi: 10.11999/JEIT200531.
    [20]
    王艳, 孙钦东, 荣东柱, 等. 伪影间共性机理驱动的多域感知社交网络深度伪造视频检测[J]. 电子与信息学报, 2024, 46(9): 3713–3721. doi: 10.11999/JEIT240025.

    WANG Yan, SUN Qindong, RONG Dongzhu, et al. Deepfake video detection on social networks using multi-domain aware driven by common mechanism analysis between artifacts[J]. Journal of Electronics & Information Technology, 2024, 46(9): 3713–3721. doi: 10.11999/JEIT240025.
    [21]
    WANG Zhendong, BAO Jianmin, ZHOU Wengang, et al. Dire for diffusion-generated image detection[C]. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023: 22445–22455. doi: 10.1109/ICCV51070.2023.02051.
    [22]
    HOODA A, MANGAOKAR N, FENG R, et al. D4: Detection of adversarial diffusion deepfakes using disjoint ensembles[C]. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, USA, 2024: 3812–3822. doi: 10.1109/WACV57701.2024.00377.
    [23]
    LIU Baoping, LIU Bo, DING Ming, et al. Detection of diffusion model-generated faces by assessing smoothness and noise tolerance[C]. 2024 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Toronto, Canada, 2024: 1–6. doi: 10.1109/BMSB62888.2024.10608232.
    [24]
    ZHANG Hu, ZU Keke, LU Jian, et al. EPSANet: An efficient pyramid squeeze attention block on convolutional neural network[C]. Proceedings of the 16th Asian Conference on Computer Vision, Macao, China, 2023: 1161–1177. doi: 10.1007/978-3-031-26313-2_33.
    [25]
    COOLEY J W, LEWIS P A W, and WELCH P D. The fast Fourier transform and its applications[J]. IEEE Transactions on Education, 1969, 12(1): 27–34. doi: 10.1109/TE.1969.4320436.
    [26]
    WANG Shengyu, WANG O, ZHANG R, et al. CNN-generated images are surprisingly easy to spot. for now[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 8695–8704. doi: 10.1109/CVPR42600.2020.00872.
    [27]
    FRANK J, EISENHOFER T, SCHÖNHERR L, et al. Leveraging frequency analysis for deep fake image recognition[C]. Proceedings of the 37th International Conference on Machine Learning, 2020: 3247–3258. (查阅网上资料, 未找到对应的出版地信息, 请确认).
    [28]
    JEONG Y, KIM D, MIN S, et al. BiHPF: Bilateral high-pass filters for robust deepfake detection[C]. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, USA, 2022: 48–57. doi: 10.1109/WACV51458.2022.00293.
    [29]
    JEONG Y, KIM D, RO Y, et al. FrePGAN: Robust deepfake detection using frequency-level perturbations[C]. Proceedings of the 36th AAAI Conference on Artificial Intelligence, Washington, USA, 2022: 1060–1068. doi: 10.1609/aaai.v36i1.19990. (查阅网上资料,无法确认出版地信息是否正确).
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)  / Tables(11)

    Article Metrics

    Article views (55) PDF downloads(7) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return