Cross-domain Deepfake Detection with Dynamic Artifact Tracking and Spatio-frequency Interaction Analysis

LI Zilong; YANG Gaoming; HAN Dongyu; FANG Xianjin

doi:10.11999/JEIT251290

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2026 >

LI Zilong, YANG Gaoming, HAN Dongyu, FANG Xianjin. Cross-domain Deepfake Detection with Dynamic Artifact Tracking and Spatio-frequency Interaction Analysis[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251290

Citation:

LI Zilong, YANG Gaoming, HAN Dongyu, FANG Xianjin. Cross-domain Deepfake Detection with Dynamic Artifact Tracking and Spatio-frequency Interaction Analysis[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT251290

Citation:

PDF( 2067 KB)

Cross-domain Deepfake Detection with Dynamic Artifact Tracking and Spatio-frequency Interaction Analysis

doi: 10.11999/JEIT251290 cstr: 32379.14.JEIT251290

LI Zilong^{1, 2},
YANG Gaoming^{1, 2
,
,},
HAN Dongyu²,
FANG Xianjin²

1.
The First Affiliated Hospital of Anhui University of Science and Technology, Huainan 232001, China
2.
School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan 232001, China

Funds: The National Natural Science Foundation of China (52374155), Anhui Provincial Natural Science Foundation (2308085MF218), Natural Science Research Project of Colleges and Universities in Anhui Province (2022AH040113), Medical Special Cultivation Project of Anhui University of Science and Technology (YZ2023H2B007)

Received Date: 2025-12-04
Accepted Date: 2026-05-12
Rev Recd Date: 2026-05-07

Available Online: 2026-06-05

Abstract

Abstract

Objective The rapid development of Generative Adversarial Network (GAN) and Diffusion Model (DM) techniques has sharply increased the number of fake images. The wide dissemination of such images poses potential and unpredictable risks to individuals, society, and national security. Efficient and highly generalizable deepfake detection methods are therefore needed. Cross-domain detection has become a central task in deepfake detection. However, existing methods often rely on specific artifacts or fixed parameters for feature extraction. They also learn spatial and frequency modalities separately, lack dynamic interaction mechanisms, and provide insufficient global feature association. To address these limitations, a Pyramid Interactive Dual-Stream Network (PIDSNet) is proposed. This network integrates dynamic artifact tracking with spatio-frequency interaction analysis. Methods PIDSNet consists of spatial and frequency branches (Fig. 1) and four cooperative modules: the Multi-Branch Feature Extraction (MBFE) module, the Frequency Domain Feature Extraction (FDFE) module, the Pyramid Spatio-Frequency Interaction (PSFI) module, and the Multi-Head Pyramid Squeeze Attention (MHPSA) module. MBFE (Fig. 2), which serves as the basic unit of the spatial branch, uses multilevel and multibranch dilated convolutions to reduce information loss as the receptive field expands. It extracts global and local features jointly. FDFE, which is central to the frequency branch, combines MBFE with spectral convolution to dynamically identify frequency-domain artifact features. This design reduces the reliance of traditional frequency-domain methods on fixed parameters and frequency bands. It also improves the adaptive capture of artifacts from different generative models. PSFI drives interaction between the two branches (Fig. 3). It constructs a Gaussian pyramid in the spatial domain and a Laplacian pyramid in the frequency domain to capture low-frequency global information and high-frequency details, respectively. Dynamic weighting at each pyramid level supports adaptive spatio-frequency feature fusion and builds a dynamic interaction mechanism. MHPSA integrates Multi-Head Self-Attention (MHSA) with dilated convolution (Fig. 4). It retains the local detail capture ability of the Pyramid Squeeze Attention (PSA) module and strengthens global feature modeling, thereby improving model adaptability and robustness. Results and Discussions To evaluate cross-domain detection across different generative paradigms, PIDSNet is trained on the ProGAN subset and tested on multiple GAN and DM datasets. For GAN detection, the mean Accuracy (Acc.) reaches 95.2% on the ForenSynths test set containing four GANs (Table 3). This value is 5.3 and 5.2 percentage points higher than those of LGrad and FreqNet, respectively. On the GANGen dataset containing nine GANs (Tables 4 and 5), the mean Acc. reaches 95.5%. This result represents a 20.1 percentage-point improvement over F3Net. Compared with FreqNet, PIDSNet improves mean Acc. and mean Average Precision (A.P.) by 4.1 and 1.3 percentage points, respectively. For DM detection, tests are conducted on the DiffusionForensics and Ojha datasets. On DiffusionForensics (Table 6), the mean Acc. reaches 95.4%, which is 4.8 and 13.2 percentage points higher than those of LGrad and FreqNet, respectively. On Ojha (Table 7), the mean Acc. and mean A.P. reach 96.1% and 99.4%, respectively. More importantly, PIDSNet has only 2.4M parameters (Table 8) and achieves mean Acc. and mean A.P. values of 95.7% and 98.7% across 25 datasets, outperforming competing methods. These experiments indicate that PIDSNet, although trained only on the ProGAN subset, adapts to multiple GAN types and effectively detects DM-generated images with different spatial and frequency artifact characteristics. This confirms its strong cross-model and cross-paradigm generalization ability. In addition, Gradient-weighted Class Activation Mapping (Grad-CAM) visualizations indicate that PIDSNet identifies detection-relevant regions in face images, although face images are absent from the training data (Fig. 5). Conclusions This study addresses the weak domain adaptability and poor generalization of current GAN and DM detection methods, which often rely on domain-specific artifacts or fixed parameters and have limited modality interaction. A spatio-frequency collaborative learning framework and a dynamic artifact tracking mechanism are constructed to reduce reliance on specific artifacts and fixed parameters. This design improves the extraction of general forgery features. The effectiveness of PIDSNet is validated on image datasets generated by 25 different GAN and DM models. Compared with current advanced models, the mean Acc. and A.P. are improved, confirming strong performance in cross-domain deepfake detection. However, PIDSNet still has limitations. For specific models such as S3GAN, whose high-frequency energy distribution is close to that of real images, performance can still be improved. Future work should further optimize frequency-domain feature extraction, improve detection under compression distortion and noise interference, and study artifact separation and detection for images generated by multiple models. These directions may further improve model adaptability in complex real-world settings.
- Deepfake detection,
- Deep learning,
- Computer vision,
- Dynamic artifact tracking,
- Spatial-frequency interactive analysis

FullText(HTML)

References(29)

References

[1]	LIU Zhian, LI Maomao, ZHANG Yong, et al. Fine-grained face swapping via regional GAN inversion[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023: 8578–8587. doi: 10.1109/CVPR52729.2023.00829.
[2]	ZHAO Wenliang, RAO Yongming, SHI Weikang, et al. DiffSwap: High-fidelity and controllable face swapping via 3D-aware masked diffusion[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023: 8568–8577. doi: 10.1109/CVPR52729.2023.00828.
[3]	BALIAH S, LIN Qinliang, LIAO Shengcai, et al. Realistic and efficient face swapping: A unified approach with diffusion models[C]. 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, USA, 2025: 1062–1071. doi: 10.1109/WACV61041.2025.00112.
[4]	YUAN Shuaiwei, DONG Junyu, and LI Yuezun. Where the devil hides: Deepfake detectors can no longer be trusted[C]. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2025: 8764–8774. doi: 10.1109/CVPR52734.2025.00819.
[5]	HUANG Zhenglin, HU Jinwei, LI Xiangtai, et al. SIDA: Social media image deepfake detection, localization and explanation with large multimodal model[C]. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2025: 28831–28841. doi: 10.1109/CVPR52734.2025.02685.
[6]	丁峰, 匡仁盛, 周越, 等. 深度伪造及其取证技术综述[J]. 中国图象图形学报, 2024, 29(2): 295–317. doi: 10.11834/jig.230088. DING Feng, KUANG Rensheng, ZHOU Yue, et al. A survey of deepfake and related digital forensics[J]. Journal of Image and Graphics, 2024, 29(2): 295–317. doi: 10.11834/jig.230088.
[7]	CAO Junyi, MA Chao, YAO Taiping, et al. End-to-end reconstruction-classification learning for face forgery detection[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 4113–4122. doi: 10.1109/CVPR52688.2022.00408.
[8]	SHIOHARA K and YAMASAKI T. Detecting deepfakes with self-blended images[C]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, USA, 2022: 18720–18729. doi: 10.1109/CVPR52688.2022.01816.
[9]	张晶, 许盼, 刘文君, 等. 多样性负实例生成的跨域人脸伪造检测[J]. 中国图象图形学报, 2025, 30(2): 421–434 doi: 10.11834/jig.240160. ZHANG Jing, XU Pan, LIU Wenjun, et al. Negative instance generation for cross-domain facial forgery detection[J]. Journal of Image and Graphics, 2025, 30(2): 421–434. doi: 10.11834/jig.240160.
[10]	YAN Zhiyuan, LUO Yuhao, LYU Siwei, et al. Transcending forgery specificity with latent space augmentation for generalizable deepfake detection[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2024: 8984–8994. doi: 10.1109/CVPR52733.2024.00858.
[11]	OJHA U, LI Yuheng, and LEE Y J. Towards universal fake image detectors that generalize across generative models[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023: 24480–24489. doi: 10.1109/CVPR52729.2023.02345.
[12]	KASHIANI H, TALEMI N A, and AFGHAH F. FreqDebias: Towards generalizable deepfake detection via consistency-driven frequency debiasing[C]. 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, USA, 2025: 8775–8785. doi: 10.1109/CVPR52734.2025.00820.
[13]	TAN Chuangchuang, ZHAO Yao, WEI Shikui, et al. Learning on gradients: Generalized artifacts representation for GAN-generated images detection[C]. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023: 12105–12114. doi: 10.1109/CVPR52729.2023.01165.
[14]	TAN Chuangchuang, LIU Huan, ZHAO Yao, et al. Rethinking the up-sampling operations in CNN-based generative network for generalizable deepfake detection[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2024: 28130–28139. doi: 10.1109/CVPR52733.2024.02657.
[15]	TAN Chuangchuang, ZHAO Yao, WEI Shikui, et al. Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning[C]. The 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 5052–5060. doi: 10.1609/aaai.v38i5.28310.
[16]	QIAN Yuyang, YIN Guojun, SHENG Lu, et al. Thinking in frequency: Face forgery detection by mining frequency-aware clues[C]. The 16th European Conference on Computer Vision, Glasgow, UK, 2020: 86–103. doi: 10.1007/978-3-030-58610-2_6.
[17]	BINH L M and WOO S. ADD: Frequency attention and multi-view based knowledge distillation to detect low-quality compressed deepfake images[C]. The 36th AAAI Conference on Artificial Intelligence, Washington, USA, 2022: 122–130. doi: 10.1609/aaai.v36i1.19886.
[18]	WANG Bo, WU Xiaohan, WANG Fei, et al. Spatial-frequency feature fusion based deepfake detection through knowledge distillation[J]. Engineering Applications of Artificial Intelligence, 2024, 133: 108341. doi: 10.1016/j.engappai.2024.108341.
[19]	孙磊, 张洪蒙, 毛秀青, 等. 基于超分辨率重建的强压缩深度伪造视频检测[J]. 电子与信息学报, 2021, 43(10): 2967–2975. doi: 10.11999/JEIT200531. SUN Lei, ZHANG Hongmeng, MAO Xiuqing, et al. Super-resolution reconstruction detection method for deepfake hard compressed videos[J]. Journal of Electronics & Information Technology, 2021, 43(10): 2967–2975. doi: 10.11999/JEIT200531.
[20]	王艳, 孙钦东, 荣东柱, 等. 伪影间共性机理驱动的多域感知社交网络深度伪造视频检测[J]. 电子与信息学报, 2024, 46(9): 3713–3721. doi: 10.11999/JEIT240025. WANG Yan, SUN Qindong, RONG Dongzhu, et al. Deepfake video detection on social networks using multi-domain aware driven by common mechanism analysis between artifacts[J]. Journal of Electronics & Information Technology, 2024, 46(9): 3713–3721. doi: 10.11999/JEIT240025.
[21]	WANG Zhendong, BAO Jianmin, ZHOU Wengang, et al. Dire for diffusion-generated image detection[C]. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023: 22445–22455. doi: 10.1109/ICCV51070.2023.02051.
[22]	HOODA A, MANGAOKAR N, FENG R, et al. D4: Detection of adversarial diffusion deepfakes using disjoint ensembles[C]. 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, USA, 2024: 3812–3822. doi: 10.1109/WACV57701.2024.00377.
[23]	LIU Baoping, LIU Bo, DING Ming, et al. Detection of diffusion model-generated faces by assessing smoothness and noise tolerance[C]. 2024 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Toronto, Canada, 2024: 1–6. doi: 10.1109/BMSB62888.2024.10608232.
[24]	ZHANG Hu, ZU Keke, LU Jian, et al. EPSANet: An efficient pyramid squeeze attention block on convolutional neural network[C]. The 16th Asian Conference on Computer Vision, Macao, China, 2023: 1161–1177. doi: 10.1007/978-3-031-26313-2_33.
[25]	COOLEY J W, LEWIS P A W, and WELCH P D. The fast Fourier transform and its applications[J]. IEEE Transactions on Education, 1969, 12(1): 27–34. doi: 10.1109/TE.1969.4320436.
[26]	WANG Shengyu, WANG O, ZHANG R, et al. CNN-generated images are surprisingly easy to spot. for now[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, USA, 2020: 8695–8704. doi: 10.1109/CVPR42600.2020.00872.
[27]	FRANK J, EISENHOFER T, SCHÖNHERR L, et al. Leveraging frequency analysis for deep fake image recognition[C]. The 37th International Conference on Machine Learning, 2020: 3247–3258.
[28]	JEONG Y, KIM D, MIN S, et al. BiHPF: Bilateral high-pass filters for robust deepfake detection[C]. 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, USA, 2022: 48–57. doi: 10.1109/WACV51458.2022.00293.
[29]	JEONG Y, KIM D, RO Y, et al. FrePGAN: Robust deepfake detection using frequency-level perturbations[C]. The 36th AAAI Conference on Artificial Intelligence, Washington, USA, 2022: 1060–1068. doi: 10.1609/aaai.v36i1.19990.