Advanced Search
Turn off MathJax
Article Contents
DONG Qingwei, FU Xueting, ZHANG Benkui. MCL-PhishNet: A Multi-Modal Contrastive Learning Network for Phishing URL Detection[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250758
Citation: DONG Qingwei, FU Xueting, ZHANG Benkui. MCL-PhishNet: A Multi-Modal Contrastive Learning Network for Phishing URL Detection[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250758

MCL-PhishNet: A Multi-Modal Contrastive Learning Network for Phishing URL Detection

doi: 10.11999/JEIT250758 cstr: 32379.14.JEIT250758
  • Received Date: 2025-08-19
  • Accepted Date: 2025-12-03
  • Rev Recd Date: 2025-12-03
  • Available Online: 2025-12-09
  •   Objective  The growing complexity and rapid evolution of phishing attacks present challenges to traditional detection methods, including feature redundancy, multi-modal mismatch, and limited robustness to adversarial samples.  Methods  MCL-PhishNet is proposed as a Multi-Modal Contrastive Learning framework that achieves precise phishing URL detection through a hierarchical syntactic encoder, bidirectional cross-modal attention mechanisms, and curriculum contrastive learning strategies. In this framework, multi-scale residual convolutions and Transformers jointly model local grammatical patterns and global dependency relationships of URLs, whereas a 17-dimensional statistical feature set improves robustness to adversarial samples. The dynamic contrastive learning mechanism optimizes the feature-space distribution through online spectral-clustering-based semantic subspace partitioning and boundary-margin constraints.  Results and Discussions  This study demonstrates consistent performance across different datasets (EBUU17 accuracy 99.41%, PhishStorm 99.41%, Kaggle 99.30%), validating the generalization capability of MCL-PhishNet. The three datasets differ significantly in sample distribution, attack types, and feature dimensions, yet the method in this study maintains stable high performance, indicating that the multimodal contrastive learning framework has strong cross-scenario adaptability. Compared to methods optimized for specific datasets, this approach avoids overfitting to particular dataset distributions through end-to-end learning and an adaptive feature fusion mechanism.  Conclusions  This paper addresses the core challenges in phishing URL detection, such as the difficulty of dynamic syntax pattern modeling, multimodal feature mismatches, and insufficient adversarial robustness, and proposes a multimodal contrastive learning framework, MCL-PhishNet. Through a collaborative mechanism of hierarchical syntax encoding, dynamic semantic distillation, and curriculum optimization, it achieves 99.41% accuracy and a 99.65% F1 score on datasets like EBUU17 and PhishStorm, improving existing state-of-the-art methods by 0.27%~3.76%. Experiments show that this approach effectively captures local variation patterns in URLs (such as numeric substitution attacks in ‘payp41-log1n.com’) through a residual convolution-Transformer collaborative architecture and reduces the false detection rate of path-sensitive parameters to 0.07% via a bidirectional cross-modal attention mechanism. However, the proposed framework has relatively high complexity. Although the hierarchical encoding module of MCL-PhishNet (including multi-scale CNNs, Transformers, and gated networks) improves detection accuracy, it also increases the number of model parameters. Moreover, the current model is trained primarily on English-based public datasets, resulting in significantly reduced detection accuracy for non-Latin characters (such as Cyrillic domain confusions) and regional phishing strategies (such as ‘fake’ URLs targeting local payment platforms).
  • loading
  • [1]
    LIU Ruitong, WANG Yanbin, XU Haitao, et al. PMANet: Malicious URL detection via post-trained language model guided multi-level feature attention network[J]. Information Fusion, 2025, 113: 102638. doi: 10.1016/j.inffus.2024.102638.
    [2]
    钟文康, 王添, 张功萱. 基于组件分割的钓鱼URL检测方法[J]. 信息安全学报, 2025, 10(1): 130–142. doi: 10.19363/J.cnki.cn10-1380/tn.2025.01.10.

    ZHONG Wenkang, WANG Tian, and ZHANG Gongxuan. Phishing URL detection method based on component segmentation[J]. Journal of Cyber Security, 2025, 10(1): 130–142. doi: 10.19363/J.cnki.cn10-1380/tn.2025.01.10.
    [3]
    JAIN A K and GUPTA B B. A survey of phishing attack techniques, defence mechanisms and open research challenges[J]. Enterprise Information Systems, 2022, 16(4): 527–565. doi: 10.1080/17517575.2021.1896786.
    [4]
    OMOLARA A E and ALAWIDA M. DaE2: Unmasking malicious URLs by leveraging diverse and efficient ensemble machine learning for online security[J]. Computers & Security, 2025, 148: 104170. doi: 10.1016/j.cose.2024.104170.
    [5]
    PANDEY P and MISHRA N. Phish-sight: A new approach for phishing detection using dominant colors on web pages and machine learning[J]. International Journal of Information Security, 2023, 22(4): 881–891. doi: 10.1007/s10207-023-00672-4.
    [6]
    CHEN Qisheng and OMOTE K. An intrinsic evaluator for embedding methods in malicious URL detection[J]. International Journal of Information Security, 2025, 24(1): 36. doi: 10.1007/s10207-024-00950-9.
    [7]
    文伟平, 朱一帆, 吕子晗, 等. 针对品牌的网络钓鱼扩线与检测方案[J]. 信息网络安全, 2023, 23(12): 1–9. doi: 10.3969/j.issn.1671-1122.2023.12.001.

    WEN Weiping, ZHU Yifan, LYU Zihan, et al. Brand-specific phishing expansion and detection solutions[J]. Netinfo Security, 2023, 23(12): 1–9. doi: 10.3969/j.issn.1671-1122.2023.12.001.
    [8]
    胡忠义, 张硕果, 吴江. 基于URL多粒度特征融合的钓鱼网站识别[J]. 数据分析与知识发现, 2022, 6(11): 103–110. doi: 10.11925/infotech.2096-3467.2022.0141.

    HU Zhongyi, ZHANG Shuoguo, and WU Jiang. Identifying phishing websites based on URL multi-granularity feature fusion[J]. Data Analysis and Knowledge Discovery, 2022, 6(11): 103–110. doi: 10.11925/infotech.2096-3467.2022.0141.
    [9]
    SABIR B, BABAR M A, GAIRE R, et al. Reliability and robustness analysis of machine learning based phishing URL detectors[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 1–18. doi: 10.1109/TDSC.2022.3218043.
    [10]
    DO N Q, SELAMAT A, FUJITA H, et al. An integrated model based on deep learning classifiers and pre-trained transformer for phishing URL detection[J]. Future Generation Computer Systems, 2024, 161: 269–285. doi: 10.1016/j.future.2024.06.031.
    [11]
    ASIRI S, XIAO Yang, ALZAHRANI S, et al. PhishingRTDS: A real-time detection system for phishing attacks using a deep learning model[J]. Computers & Security, 2024, 141: 103843. doi: 10.1016/j.cose.2024.103843.
    [12]
    OPARA C, CHEN Yingke, and WEI Bo. Look before you leap: Detecting phishing web pages by exploiting raw URL and HTML characteristics[J]. Expert Systems with Applications, 2024, 236: 121183. doi: 10.1016/j.eswa.2023.121183.
    [13]
    谢丽霞, 张浩, 杨宏宇, 等. 网络钓鱼检测研究综述[J]. 电子科技大学学报, 2024, 53(6): 883–899. doi: 10.12178/1001-0548.2023273.

    XIE Lixia, ZHANG Hao, YANG Hongyu, et al. A review of phishing detection research[J]. Journal of University of Electronic Science and Technology of China, 2024, 53(6): 883–899. doi: 10.12178/1001-0548.2023273.
    [14]
    DU Yuefeng, DUAN Huayi, XU Lei, et al. PEBA: Enhancing user privacy and coverage of safe browsing services[J]. IEEE Transactions on Dependable and Secure Computing, 2023, 20(5): 4343–4358. doi: 10.1109/TDSC.2022.3204767.
    [15]
    胡强, 刘倩, 周杭霞. 基于改进Stacking策略的钓鱼网站检测研究[J]. 广西师范大学学报: 自然科学版, 2022, 40(3): 132–140. doi: 10.16088/j.issn.1001-6600.2021071201.

    HU Qiang, LIU Qian, and ZHOU Hangxia. Study on phishing website detection based on improved Stacking strategy[J]. Journal of Guangxi Normal University: Natural Science Edition, 2022, 40(3): 132–140. doi: 10.16088/j.issn.1001-6600.2021071201.
    [16]
    杨鹏, 曾朋, 赵广振, 等. 基于Logistic回归和XGBoost的钓鱼网站检测方法[J]. 东南大学学报: 自然科学版, 2019, 49(2): 207–212. doi: 10.3969/j.issn.1001-0505.2019.02.001.

    YANG Peng, ZENG Peng, ZHAO Guangzhen, et al. Phishing website detection method based on Logistic regression and XGBoost[J]. Journal of Southeast University: Natural Science Edition, 2019, 49(2): 207–212. doi: 10.3969/j.issn.1001-0505.2019.02.001.
    [17]
    SAHINGOZ O K, BUBER E, DEMIR O, et al. Machine learning based phishing detection from URLs[J]. Expert Systems with Applications, 2019, 117: 345–357. doi: 10.1016/j.eswa.2018.09.029.
    [18]
    卜佑军, 张桥, 陈博, 等. 基于CNN和BiLSTM的钓鱼URL检测技术研究[J]. 郑州大学学报: 工学版, 2021, 42(6): 14–20. doi: 10.13705/j.issn.1671-6833.2021.04.022.

    BU Youjun, ZHANG Qiao, CHEN Bo, et al. Research on phishing URL detection technology based on CNN-BiLSTM[J]. Journal of Zhengzhou University: Engineering Science, 2021, 42(6): 14–20. doi: 10.13705/j.issn.1671-6833.2021.04.022.
    [19]
    张鹏, 孙博文, 李唯实, 等. 基于LSTM的钓鱼邮件检测系统[J]. 北京理工大学学报, 2020, 40(12): 1289–1294. doi: 10.15918/j.tbit1001-0645.2019.262.

    ZHANG Peng, SUN Bowen, LI Weishi, et al. Phishing mail detection system based on LSTM neural network[J]. Transactions of Beijing Institute of Technology, 2020, 40(12): 1289–1294. doi: 10.15918/j.tbit1001-0645.2019.262.
    [20]
    AKÇAM Ö Ş, TEKEREK A, and TEKEREK M. Development of BiLSTM deep learning model to detect URL-based phishing attacks[J]. Computers and Electrical Engineering, 2025, 123: 110212. doi: 10.1016/j.compeleceng.2025.110212.
    [21]
    PRASAD Y B and DONDETI V. PDSMV3-DCRNN: A novel ensemble deep learning framework for enhancing phishing detection and URL extraction[J]. Computers & Security, 2025, 148: 104123. doi: 10.1016/j.cose.2024.104123.
    [22]
    张重生, 陈杰, 李岐龙, 等. 深度对比学习综述[J]. 自动化学报, 2023, 49(1): 15–39. doi: 10.16383/j.aas.c220421.

    ZHANG Chongsheng, CHEN Jie, LI Qilong, et al. Deep contrastive learning: A survey[J]. Acta Automatica Sinica, 2023, 49(1): 15–39. doi: 10.16383/j.aas.c220421.
    [23]
    侯明泽, 饶蕾, 范光宇, 等. 基于课程学习的跨度级方面情感三元组提取[J]. 浙江大学学报: 工学版, 2025, 59(1): 79–88. doi: 10.3785/j.issn.1008-973X.2025.01.008.

    HOU Mingze, RAO Lei, FAN Guangyu, et al. Span-level aspect sentiment triplet extraction based on curriculum learning[J]. Journal of Zhejiang University: Engineering Science, 2025, 59(1): 79–88. doi: 10.3785/j.issn.1008-973X.2025.01.008.
    [24]
    JAMES J, SANDHYA L, and THOMAS C. Detection of phishing URLs using machine learning techniques[C]. 2013 International Conference on Control Communication and Computing, Thiruvananthapuram, India, 2013: 304–309. doi: 10.1109/ICCC.2013.6731669.
    [25]
    TYAGI I, SHAD J, SHARMA S, et al. A novel machine learning approach to detect phishing websites[C]. 5th International Conference on Signal Processing and Integrated Networks, Noida, India, 2018: 425–430. doi: 10.1109/SPIN.2018.8474040.
    [26]
    PATIL V, THAKKAR P, SHAH C, et al. Detection and prevention of phishing websites using machine learning approach[C]. 4th International Conference on Computing Communication Control and Automation, Pune, India, 2018: 1–5. doi: 10.1109/ICCUBEA.2018.8697412.
    [27]
    LI Yukun, YANG Zhenguo, CHEN Xu, et al. A stacking model using URL and HTML features for phishing webpage detection[J]. Future Generation Computer Systems, 2019, 94: 27–39. doi: 10.1016/j.future.2018.11.004.
    [28]
    ABDELHAMID N, THABTAH F, and ABDEL-JABER H. Phishing detection: A recent intelligent machine learning comparison based on models content and features[C]. 2017 International Conference on Intelligence and Security Informatics, Beijing, China, 2017: 72–77. doi: 10.1109/ISI.2017.8004877.
    [29]
    JAGADEESAN S, CHATURVEDI A, and KUMAR S. URL phishing analysis using random forest[J]. International Journal of Pure and Applied Mathematics, 2018, 118(20): 4159–4163.
    [30]
    CHIEW K L, TAN C L, WONG K S, et al. A new hybrid ensemble feature selection framework for machine learning-based phishing detection system[J]. Information Sciences, 2019, 484: 153–166. doi: 10.1016/j.ins.2019.01.064.
    [31]
    BOZKIR A S, DALGIC F C, and AYDOS M. GramBeddings: A new neural network for URL based identification of phishing web pages through N-gram embeddings[J]. Computers & Security, 2023, 124: 102964. doi: 10.1016/j.cose.2022.102964.
    [32]
    PRABAKARAN M K, SUNDARAM P M, and CHANDRASEKAR A D. An enhanced deep learning-based phishing detection mechanism to effectively identify malicious URLs using variational autoencoders[J]. IET Information Security, 2023, 17(3): 423–440. doi: 10.1049/ise2.12106.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(2)

    Article Metrics

    Article views (95) PDF downloads(14) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return