高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

结合预训练模型的双向门控图卷积对抗词义消歧

张春祥 孙颖 高可心 高雪瑶

张春祥, 孙颖, 高可心, 高雪瑶. 结合预训练模型的双向门控图卷积对抗词义消歧[J]. 电子与信息学报. doi: 10.11999/JEIT250386
引用本文: 张春祥, 孙颖, 高可心, 高雪瑶. 结合预训练模型的双向门控图卷积对抗词义消歧[J]. 电子与信息学报. doi: 10.11999/JEIT250386
ZHANG Chunxiang, SUN Ying, GAO Kexin, GAO Xueyao. Combine the Pre-trained Model with Bidirectional Gated Recurrent Units and Graph Convolutional Network for Adversarial Word Sense Disambiguation[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250386
Citation: ZHANG Chunxiang, SUN Ying, GAO Kexin, GAO Xueyao. Combine the Pre-trained Model with Bidirectional Gated Recurrent Units and Graph Convolutional Network for Adversarial Word Sense Disambiguation[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250386

结合预训练模型的双向门控图卷积对抗词义消歧

doi: 10.11999/JEIT250386 cstr: 32379.14.JEIT250386
基金项目: 国家自然科学基金(61502124, 60903082),中国博士后科学基金(2014M560249),黑龙江省自然科学基金(LH2022F031, LH2022F030, F2015041, F201420)
详细信息
    作者简介:

    张春祥:男,教授,研究方向为自然语言处理、机器学习、图形图像处理

    孙颖:女,硕士生,研究方向为自然语言处理

    高可心:男,硕士生,研究方向为自然语言处理

    高雪瑶:女,教授,研究方向为图形图像处理、自然语言处理、机器学习

    通讯作者:

    高雪瑶 xueyao_gao@163.com

  • 中图分类号: TN919.8; TP391.1

Combine the Pre-trained Model with Bidirectional Gated Recurrent Units and Graph Convolutional Network for Adversarial Word Sense Disambiguation

Funds: The National Natural Science Foundation of China (61502124, 60903082), China Postdoctoral Science Foundation (2014M560249), Heilongjiang Provincial Natural Science Foundation of China (LH2022F031, LH2022F030, F2015041, F201420)
  • 摘要: 词义消歧(WSD)是提升计算机自然语言理解能力的关键技术,广泛应用于机器翻译、信息检索等领域。为解决现有模型在泛化与鲁棒性方面的不足,该文提出了一种基于预训练模型的双向门控循环单元(BiGRU)、交叉注意力(CA)和图卷积网络(GCN)融合的词义消歧模型,引入对抗训练(AT)来优化该模型。将歧义词左右词汇的词形、词性和语义类作为消歧特征,输入LERT获取动态词向量,利用交叉注意力融合Bi-GRU神经网络提取token序列的全局语义信息和CLS序列的局部语义信息,为消歧特征图生成更加完整的句子结点表示。将消歧特征图输入图卷积来更新结点之间的特征信息,然后利用插值预测层和语义分类层来确定歧义词的真实语义类别。计算输入动态词向量的梯度,生成细微的连续扰动,并将扰动加入到原始词向量矩阵中,生成对抗样本。利用对抗样本,融合网络的损失与对抗训练中的损失来优化消歧模型。实验结果表明,该方法不仅能够增强消歧模型处理复杂词汇歧义问题的能力,还能有效提高其鲁棒性和泛化能力,从而表现出更好的消歧性能。
  • 图  1  LBGCA-GCN消歧模型

    图  2  基于对抗训练的词义消歧模型框架LBGCA-GCN-AT

    图  3  不同序列融合方法下的消歧性能对比

    表  1  对抗训练算法描述

    算法 算法描述
    输入:LERT输出的词向量T
    FGSM
    FGM
    对于每个T
    ①计算T的前向损失,反向传播得到梯度$ {{\text{∇}} _T}L(\theta ,T,y) $;
    ②计算输入样本T的扰动值radv,生成对抗样本Tadv=T+radv
    ③计算Tadv的前向损失,累加反向传播的梯度$ {{\text{∇}} _T}L(\theta ,T,y) $到步骤①的梯度上;
    ④恢复LERT输出的向量为步骤①时的值;
    ⑤根据步骤③的梯度对参数进行更新。
    PGD
    FreeAT
    对于每个T
    ①计算T的前向损失,反向传播得到梯度$ {{\text{∇}} _T}L(\theta ,T,y) $并备份;
    对于每步t
    ②计算输入样本T的扰动值Tt,生成对抗样本Tadv=Tt
    t != k时:将梯度归0,计算Tadv的前向损失,反向传播得到梯度$ {{\text{∇}} _T}L(\theta ,T,y) $;
    t == k时:恢复步骤①的梯度,计算g(Tt)并累加到步骤①上;
    ⑤恢复LERT输出的向量为步骤①时的值;
    ⑥根据步骤④的梯度对参数进行更新。
    FreeLB FreeLB算法在PGD算法的基础上,将步骤④改为:
    t == k时:恢复步骤①的梯度,计算平均梯度$ {{\text{∇}} _T}L(\theta ,T,y) $/k累加到步骤①上;
    下载: 导出CSV

    表  2  对抗训练算法对模型性能的影响

    对抗训练算法SemEval-2007: Task#5语料HealthWSD语料
    AmarF1marPmarRmarAmarF1marPmarRmar
    FGSM算法0.759 00.706 80.719 80.757 30.963 60.956 10.957 40.957 5
    FGM算法0.768 60.717 40.724 40.752 80.963 30.952 90.951 00.958 1
    PGD算法0.771 10.715 00.723 40.756 00.967 40.958 20.953 30.960 5
    FreeAT算法0.761 60.708 70.718 80.754 80.968 40.957 00.956 70.960 6
    FreeLB算法0.806 90.780 20.780 40.810 40.969 40.959 10.959 00.961 5
    下载: 导出CSV

    表  3  不同扰动步数下的消歧性能对比

    对抗算法扰动步数SemEval-2007: Task#5语料HealthWSD语料
    AmarF1marPmarRmarAmarF1marPmarRmar
    PGD20.751 40.709 40.715 40.756 60.961 70.950 50.951 00.953 7
    30.756 30.701 80.711 70.743 30.967 00.956 40.954 40.961 3
    40.771 10.715 00.723 40.756 00.967 40.958 20.953 30.960 5
    FreeAT20.756 80.706 70.714 10.757 10.965 60.957 20.957 20.961 0
    30.749 10.706 60.713 10.747 20.963 00.952 70.950 00.960 0
    40.761 60.708 70.718 80.754 80.968 40.957 00.956 70.960 6
    FreeLB20.763 50.719 10.726 80.765 90.962 40.948 30.947 20.954 3
    30.806 90.780 20.780 40.810 40.969 40.959 10.959 00.961 5
    40.766 50.723 00.728 40.765 60.966 70.952 70.951 60.959 3
    下载: 导出CSV

    表  4  消融实验

    模型SemEval-2007: Task#5语料HealthWSD语料
    AmarF1marPmarRmarAmarF1marPmarRmar
    LERT0.746 90.705 90.712 40.753 50.933 80.919 40.917 80.924 4
    LBG0.768 00.731 30.737 60.783 40.946 70.932 00.929 80.939 6
    LBG-GCN0.784 50.743 10.742 70.802 80.956 00.950 90.949 90.954 1
    LBGCA-GCN0.797 60.758 00.762 90.804 30.963 50.953 50.951 50.962 9
    LBGCA-GCN-AT0.806 90.780 20.780 40.810 40.969 40.959 10.959 00.961 5
    下载: 导出CSV

    表  5  对比实验

    模型SemEval-2007: Task#5语料HealthWSD语料
    AmarF1marPmarRmarAmarF1marPmarRmar
    BiLSTM0.659 70.553 20.609 90.580 60.739 00.631 70.686 40.645 1
    TextCNN0.660 60.595 20.630 80.619 60.850 30.777 90.847 00.774 6
    TextGCN0.671 30.617 80.634 70.651 30.875 70.823 70.809 10.875 2
    GraphSAGE0.658 70.602 90.627 30.610 40.828 90.795 40.827 30.788 5
    BERT0.740 80.700 40.702 80.757 90.919 60.898 30.899 70.902 3
    RoBERTa0.742 90.703 10.708 20.740 20.928 10.915 60.914 00.921 6
    MacBERT0.736 70.699 40.705 20.731 50.922 30.911 30.911 50.915 2
    LERT0.746 90.705 90.712 40.753 50.933 80.919 40.917 80.924 4
    jina0.747 20.693 50.701 50.732 70.958 80.948 40.943 20.960 5
    MRHA0.761 70.684 60.724 70.694 30.901 00.807 80.823 50.810 6
    LBGCA-GCN-AT0.806 90.780 20.780 40.810 40.969 40.959 10.959 00.961 5
    下载: 导出CSV
  • [1] MENTE R, ALAND S, and CHENDAGE B. Review of word sense disambiguation and it’s approaches[EB/OL]. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4097221, 2022. doi: 10.2139/ssrn.4097221.
    [2] ABRAHAM A, GUPTA B K, MAURYA A S, et al. Naïe Bayes approach for word sense disambiguation system with a focus on parts-of-speech ambiguity resolution[J]. IEEE Access, 2024, 12: 126668–126678. doi: 10.1109/ACCESS.2024.3453912.
    [3] WANG Yue, LIANG Qiliang, YIN Yaqi, et al. Disambiguate words like composing them: A morphology-informed approach to enhance Chinese word sense disambiguation[C]. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 2024: 15354–15365. doi: 10.18653/v1/2024.acl-long.819.
    [4] LI Linlin, LI Juxing, WANG Hongli, et al. Application of the transformer model algorithm in Chinese word sense disambiguation: A case study in Chinese language[J]. Scientific Reports, 2024, 14(1): 6320. doi: 10.1038/s41598-024-56976-5.
    [5] WAEL T, ELREFAI E, MAKRAM M, et al. Pirates at arabicNLU2024: Enhancing Arabic word sense disambiguation using transformer-based approaches[C]. Proceedings of the Second Arabic Natural Language Processing Conference, Bangkok, Thailand, 2024: 372–376. doi: 10.18653/v1/2024.arabicnlp-1.31.
    [6] MISHRA B K and JAIN S. Word sense disambiguation for Indic language using Bi-LSTM[J]. Multimedia Tools and Applications, 2024, 84(16): 16631–16656. doi: 10.1007/S11042-024-19499-9.
    [7] LYU Meng and MO Shasha. HSRG-WSD: A novel unsupervised Chinese word sense disambiguation method based on heterogeneous sememe-relation graph[C]. Proceedings of the 19th International Conference on Advanced Intelligent Computing Technology and Applications, Zhengzhou, China, 2023: 623–633. doi: 10.1007/978-981-99-4752-2_51.
    [8] PU Xiao, PAPPAS N, HENDERSON J, et al. Integrating weakly supervised word sense disambiguation into neural machine translation[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 635–649. doi: 10.1162/tacl_a_00242.
    [9] PADWAD H, KESWANI G, BISEN W, et al. Leveraging contextual factors for word sense disambiguation in Hindi language[J]. International Journal of Intelligent Systems and Applications in Engineering, 2024, 12(12s): 129–136. (查阅网上资料, 未找到doi信息, 请确认).
    [10] LI Zhi, YANG Fan, and LUO Yaoru. Context embedding based on Bi-LSTM in semi-supervised biomedical word sense disambiguation[J]. IEEE Access, 2019, 7: 72928–72935. doi: 10.1109/ACCESS.2019.2912584.
    [11] BARBA E, PROCOPIO L, CAMPOLUNGO N, et al. MulaN: Multilingual label propagation for word sense disambiguation[C]. Proceedings of the 29th International Conference on International Joint Conferences on Artificial Intelligence, Yokohama, Japan, 2021: 3837–3844. doi: 10.24963/ijcai.2020/531.
    [12] JIA Xiaojun, ZHANG Yong, WU Baoyuan, et al. Boosting fast adversarial training with learnable adversarial initialization[J]. IEEE Transactions on Image Processing, 2022, 31: 4417–4430. doi: 10.1109/TIP.2022.3184255.
    [13] RIBEIRO A H, SCHÖN T B, ZACHARIAH D, et al. Efficient optimization algorithms for linear adversarial training[C]. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics, Mai Khao, Thailand, 2025: 1207–1215.
    [14] LI J W, LIANG Renwei, YEH C H, et al. Adversarial robustness overestimation and instability in TRADES[EB/OL]. https://arxiv.org/abs/2410.07675, 2024.
    [15] CHENG Xiwei, FU Kexin, and FARNIA F. Stability and generalization in free adversarial training[EB/OL]. https://arxiv.org/abs/2404.08980, 2024.
    [16] ZHU Chen, CHENG Yu, GAN Zhe, et al. FreeLB: Enhanced adversarial training for natural language understanding[C]. Proceedings of the 8th International Conference on Learning Representations, Xi’an, China, 2020: 11232–11245.
    [17] BAI Tao, LUO Jinqi, ZHAO Jun, et al. Recent advances in adversarial training for adversarial robustness[C]. Proceedings of the 30th International Joint Conference on Artificial Intelligence, Montreal, Canada, 2021: 4312–4321. doi: 10.24963/ijcai.2021/591.
    [18] ZHANG Liwei. Word sense disambiguation model based on Bi-LSTM[C]. Proceedings of the 2022 14th International Conference on Measuring Technology and Mechatronics Automation, Changsha, China, 2022: 848–851. doi: 10.1109/ICMTMA54903.2022.00172.
    [19] KIM Y. Convolutional neural networks for sentence classification[C]. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 2014: 1746–1751. doi: 10.3115/v1/d14-1181.
    [20] YAO Liang, MAO Chengsheng, and LUO Yuan. Graph convolutional networks for text classification[C]. Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, USA, 2019: 7370–7377. doi: 10.1609/aaai.v33i01.33017370.
    [21] HAMILTON W L, YING Z, and LESKOVEC J. Inductive representation learning on large graphs[C]. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017, 30: 1025–1035.
    [22] CUI Yiming , CHE Wanxiang, LIU Ting, et al. Pre-training with whole word masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504–3514. doi: 10.1109/TASLP.2021.3124365.
    [23] LIU Yinhan, OTT M, GOYAL N, et al. RoBERTa: A robustly optimized BERT pretraining approach[EB/OL]. https://doi.org/10.48550/arXiv.1907.11692.2019.7, 2019.
    [24] CUI Yiming, CHE Wanxiang, LIU Ting, et al. Revisiting pre-trained models for Chinese natural language processing[C]. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP, 2020: 657–668. doi: 10.48550/arXiv.2004.13922. (查阅网上资料,未找到出版地信息,请确认补充).
    [25] CUI Yiming, CHE Wanxiang, WANG Shijin, et al. LERT: A linguistically-motivated pre-trained language model[EB/OL]. https://ymcui.com/pdf/lert.pdf, 2022.
    [26] STURUA S, MOHR I, AKRAM M K, et al. jina-embeddings-v3: Multilingual embeddings with task LoRA[EB/OL]. https://arxiv.org/abs/2409.10173, 2024.
    [27] 张春祥, 张育隆, 高雪瑶. 基于多通道残差混合空洞卷积的注意力词义消歧[J]. 北京邮电大学学报, 2024, 47(5): 128–134. doi: 10.13190/j.jbupt.2023-179.

    ZHANG Chunxiang, ZHANG Yulong, and GAO Xueyao. Multi-channel residual hybrid dilated convolution with attention for word sense disambiguation[J]. Journal of Beijing University of Posts and Telecommunications, 2024, 47(5): 128–134. doi: 10.13190/j.jbupt.2023-179.
  • 加载中
图(3) / 表(5)
计量
  • 文章访问数:  18
  • HTML全文浏览量:  8
  • PDF下载量:  0
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-05-08
  • 修回日期:  2025-08-28
  • 网络出版日期:  2025-09-02

目录

    /

    返回文章
    返回