Combine the Pre-trained Model with Bidirectional Gated Recurrent Units and Graph Convolutional Network for Adversarial Word Sense Disambiguation

ZHANG Chunxiang; SUN Ying; GAO Kexin; GAO Xueyao

doi:10.11999/JEIT250386

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 >

ZHANG Chunxiang, SUN Ying, GAO Kexin, GAO Xueyao. Combine the Pre-trained Model with Bidirectional Gated Recurrent Units and Graph Convolutional Network for Adversarial Word Sense Disambiguation[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250386

Citation:

ZHANG Chunxiang, SUN Ying, GAO Kexin, GAO Xueyao. Combine the Pre-trained Model with Bidirectional Gated Recurrent Units and Graph Convolutional Network for Adversarial Word Sense Disambiguation[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250386

Citation:

ZHANG Chunxiang, SUN Ying, GAO Kexin, GAO Xueyao. Combine the Pre-trained Model with Bidirectional Gated Recurrent Units and Graph Convolutional Network for Adversarial Word Sense Disambiguation[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250386

PDF( 2090 KB)

Combine the Pre-trained Model with Bidirectional Gated Recurrent Units and Graph Convolutional Network for Adversarial Word Sense Disambiguation

doi: 10.11999/JEIT250386 cstr: 32379.14.JEIT250386

School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China

Funds: The National Natural Science Foundation of China (61502124, 60903082), China Postdoctoral Science Foundation (2014M560249), Heilongjiang Provincial Natural Science Foundation of China (LH2022F031, LH2022F030, F2015041, F201420)

Received Date: 2025-05-08
Rev Recd Date: 2025-08-27

Available Online: 2025-09-02

Abstract

Abstract

Objective In Word Sense Disambiguation (WSD), the Linguistically-motivated bidirectional Encoder Representation from Transformer (LERT) is employed to capture rich semantic representations from large-scale corpora, enabling improved contextual understanding of word meanings. However, several challenges remain. Current WSD models are not sufficiently sensitive to temporal and spatial dependencies within sequences, and single-dimensional features are inadequate for representing the diversity of linguistic expressions. To address these limitations, a hybrid network is constructed by integrating LERT, Bidirectional Gated Recurrent Units (Bi-GRU), and Graph Convolutional Network (GCN). This network enhances the modeling of structured text and contextual semantics. Nevertheless, generalization and robustness remain problematic. Therefore, an adversarial training algorithm is applied to improve the overall performance and resilience of the WSD model. Methods An adversarial WSD method is proposed based on a pre-trained model, combining Bi-GRU and GCN. First, word forms, parts of speech, and semantic categories of the neighboring words of an ambiguous term are input into the LERT model to obtain the CLS sequence and token sequence. Second, cross-attention is applied to fuse the global semantic information extracted by Bi-GRU from the token sequence with the local semantic information derived from the CLS sequence. Sentences, word forms, parts of speech, and semantic categories are then used as nodes to construct a disambiguation feature graph, which is subsequently input into GCN to update the feature information of the nodes. Third, the semantic category of the ambiguous word is determined through the interpolated prediction layer and semantic classification layer. Fourth, subtle continuous perturbations are generated by computing the gradient of the dynamic word vectors in the input. These perturbations are added to the original word vector matrix to create adversarial samples, which are used to optimize the LERT+Bi-GRU+CA+GCN (LBGCA-GCN) model. A cross-entropy loss function is applied to measure the performance of the LBGCA-GCN model on adversarial samples. Finally, the loss from the network is combined with the loss from AT to optimize the LBGCA-GCN model.. Results and Discussions When the FreeLB algorithm is applied, stronger adversarial perturbations are generated, and the LBGCA-GCN-AT model achieves the best performance (Table 2). As the number of perturbation steps increases, the strength of AT improves. However, when the number of steps exceeds a certain threshold, the LBGCA-GCN+AT(LBGCA-GCN-AT) model begins to overfit. The Free Large-Batch (FreeLB) algorithm demonstrates strong robustness with three perturbation steps (Table 3). The cross-attention mechanism, which fuses the token sequence with the CLS sequence, yields significant performance gains in complex semantic scenarios (Fig. 3). By incorporating AT, the LBGCA-GCN-AT model achieves notable improvements across multiple evaluation metrics (Table 4). Conclusions This study presents an adversarial WSD method based on a pre-trained model, integrating Bi-GRU and GCN to address the weak generalization ability and robustness of conventional WSD models. LERT is used to transform discriminative features into dynamic word vectors, while cross-attention fuses the global semantic information extracted by Bi-GRU from the token sequence with the local semantic information derived from the CLS sequence. This fusion generates more complete node representations for the disambiguation feature graph. A GCN is then applied to update the relationships among nodes within the feature graph. The interpolated prediction layer and semantic classification layer are used to determine the semantic category of ambiguous words. To further improve robustness, the gradient of the dynamic word vector is computed and perturbed to generate adversarial samples, which are used to optimize the LBGCA-GCN model. The network loss is combined with the AT loss to refine the model. Experiments conducted on the SemEval-2007 Task #05 and HealthWSD datasets examine multiple factors affecting model performance, including adversarial algorithms, perturbation steps, and sequence fusion methods. Results demonstrate that introducing AT improves the model’s ability to handle real-world noise and perturbations. The proposed method not only enhances robustness and generalization but also strengthens the capacity of WSD models to capture subtle semantic distinctions.
- Word Sense Disambiguation (WSD),
- Graph Convolutional Network (GCN),
- Adversarial Training (AT),
- Disambiguation features,
- Disambiguation feature graph

FullText(HTML)

References(27)

References

[1]	MENTE R, ALAND S, and CHENDAGE B. Review of word sense disambiguation and it’s approaches[EB/OL]. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4097221, 2022. doi: 10.2139/ssrn.4097221.
[2]	ABRAHAM A, GUPTA B K, MAURYA A S, et al. Naïe Bayes approach for word sense disambiguation system with a focus on parts-of-speech ambiguity resolution[J]. IEEE Access, 2024, 12: 126668–126678. doi: 10.1109/ACCESS.2024.3453912.
[3]	WANG Yue, LIANG Qiliang, YIN Yaqi, et al. Disambiguate words like composing them: A morphology-informed approach to enhance Chinese word sense disambiguation[C]. The 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, 2024: 15354–15365. doi: 10.18653/v1/2024.acl-long.819.
[4]	LI Linlin, LI Juxing, WANG Hongli, et al. Application of the transformer model algorithm in Chinese word sense disambiguation: A case study in Chinese language[J]. Scientific Reports, 2024, 14(1): 6320. doi: 10.1038/s41598-024-56976-5.
[5]	WAEL T, ELREFAI E, MAKRAM M, et al. Pirates at arabicNLU2024: Enhancing Arabic word sense disambiguation using transformer-based approaches[C]. The Second Arabic Natural Language Processing Conference, Bangkok, Thailand, 2024: 372–376. doi: 10.18653/v1/2024.arabicnlp-1.31.
[6]	MISHRA B K and JAIN S. Word sense disambiguation for Indic language using Bi-LSTM[J]. Multimedia Tools and Applications, 2024, 84(16): 16631–16656. doi: 10.1007/S11042-024-19499-9.
[7]	LYU Meng and MO Shasha. HSRG-WSD: A novel unsupervised Chinese word sense disambiguation method based on heterogeneous sememe-relation graph[C]. The 19th International Conference on Advanced Intelligent Computing Technology and Applications, Zhengzhou, China, 2023: 623–633. doi: 10.1007/978-981-99-4752-2_51.
[8]	PU Xiao, PAPPAS N, HENDERSON J, et al. Integrating weakly supervised word sense disambiguation into neural machine translation[J]. Transactions of the Association for Computational Linguistics, 2018, 6: 635–649. doi: 10.1162/tacl_a_00242.
[9]	PADWAD H, KESWANI G, BISEN W, et al. Leveraging contextual factors for word sense disambiguation in Hindi language[J]. International Journal of Intelligent Systems and Applications in Engineering, 2024, 12(12s): 129–136.
[10]	LI Zhi, YANG Fan, and LUO Yaoru. Context embedding based on Bi-LSTM in semi-supervised biomedical word sense disambiguation[J]. IEEE Access, 2019, 7: 72928–72935. doi: 10.1109/ACCESS.2019.2912584.
[11]	BARBA E, PROCOPIO L, CAMPOLUNGO N, et al. MulaN: Multilingual label propagation for word sense disambiguation[C]. The 29th International Conference on International Joint Conferences on Artificial Intelligence, Yokohama, Japan, 2021: 3837–3844. doi: 10.24963/ijcai.2020/531.
[12]	JIA Xiaojun, ZHANG Yong, WU Baoyuan, et al. Boosting fast adversarial training with learnable adversarial initialization[J]. IEEE Transactions on Image Processing, 2022, 31: 4417–4430. doi: 10.1109/TIP.2022.3184255.
[13]	RIBEIRO A H, SCHÖN T B, ZACHARIAH D, et al. Efficient optimization algorithms for linear adversarial training[C]. The 28th International Conference on Artificial Intelligence and Statistics, Mai Khao, Thailand, 2025: 1207–1215.
[14]	LI J W, LIANG Renwei, YEH C H, et al. Adversarial robustness overestimation and instability in TRADES[EB/OL]. https://arxiv.org/abs/2410.07675, 2024.
[15]	CHENG Xiwei, FU Kexin, and FARNIA F. Stability and generalization in free adversarial training[EB/OL]. https://arxiv.org/abs/2404.08980, 2024.
[16]	ZHU Chen, CHENG Yu, GAN Zhe, et al. FreeLB: Enhanced adversarial training for natural language understanding[C]. The 8th International Conference on Learning Representations, Xi’an, China, 2020: 11232–11245.
[17]	BAI Tao, LUO Jinqi, ZHAO Jun, et al. Recent advances in adversarial training for adversarial robustness[C]. The 30th International Joint Conference on Artificial Intelligence, Montreal, Canada, 2021: 4312–4321. doi: 10.24963/ijcai.2021/591.
[18]	ZHANG Liwei. Word sense disambiguation model based on Bi-LSTM[C]. The 2022 14th International Conference on Measuring Technology and Mechatronics Automation, Changsha, China, 2022: 848–851. doi: 10.1109/ICMTMA54903.2022.00172.
[19]	KIM Y. Convolutional neural networks for sentence classification[C]. The 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 2014: 1746–1751. doi: 10.3115/v1/d14-1181.
[20]	YAO Liang, MAO Chengsheng, and LUO Yuan. Graph convolutional networks for text classification[C]. The 33rd AAAI Conference on Artificial Intelligence, Honolulu, USA, 2019: 7370–7377. doi: 10.1609/aaai.v33i01.33017370.
[21]	HAMILTON W L, YING Z, and LESKOVEC J. Inductive representation learning on large graphs[C]. The 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017, 30: 1025–1035.
[22]	CUI Yiming , CHE Wanxiang, LIU Ting, et al. Pre-training with whole word masking for Chinese BERT[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3504–3514. doi: 10.1109/TASLP.2021.3124365.
[23]	LIU Yinhan, OTT M, GOYAL N, et al. RoBERTa: A robustly optimized BERT pretraining approach[EB/OL]. https://doi.org/10.48550/arXiv.1907.11692.2019.7, 2019.
[24]	CUI Yiming, CHE Wanxiang, LIU Ting, et al. Revisiting pre-trained models for Chinese natural language processing[C]. The Findings of the Association for Computational Linguistics: EMNLP, 2020: 657–668. doi: 10.48550/arXiv.2004.13922.
[25]	CUI Yiming, CHE Wanxiang, WANG Shijin, et al. LERT: A linguistically-motivated pre-trained language model[EB/OL]. https://ymcui.com/pdf/lert.pdf, 2022.
[26]	STURUA S, MOHR I, AKRAM M K, et al. jina-embeddings-v3: Multilingual embeddings with task LoRA[EB/OL]. https://arxiv.org/abs/2409.10173, 2024.
[27]	张春祥, 张育隆, 高雪瑶. 基于多通道残差混合空洞卷积的注意力词义消歧[J]. 北京邮电大学学报, 2024, 47(5): 128–134. doi: 10.13190/j.jbupt.2023-179. ZHANG Chunxiang, ZHANG Yulong, and GAO Xueyao. Multi-channel residual hybrid dilated convolution with attention for word sense disambiguation[J]. Journal of Beijing University of Posts and Telecommunications, 2024, 47(5): 128–134. doi: 10.13190/j.jbupt.2023-179.