Advanced Search
Volume 45 Issue 6
Jun.  2023
Turn off MathJax
Article Contents
ZHANG Shunxiang, WU Houyue, ZHU Guangli, Xu Xin, SU Mingxing. Character-level Adversarial Samples Generation Approach for Chinese Text Classification[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2226-2235. doi: 10.11999/JEIT220563
Citation: ZHANG Shunxiang, WU Houyue, ZHU Guangli, Xu Xin, SU Mingxing. Character-level Adversarial Samples Generation Approach for Chinese Text Classification[J]. Journal of Electronics & Information Technology, 2023, 45(6): 2226-2235. doi: 10.11999/JEIT220563

Character-level Adversarial Samples Generation Approach for Chinese Text Classification

doi: 10.11999/JEIT220563
Funds:  The National Natural Science Foundation of China (62076006), The University Synergy Innovation Program of Anhui Province (GXXT-2021-008), The Graduate Students Scientific Research Project of Anhui Province(YJS20210402)
  • Received Date: 2022-05-07
  • Rev Recd Date: 2022-07-09
  • Available Online: 2022-07-14
  • Publish Date: 2023-06-10
  • Adversarial sample generation is a technique that makes the neural network produce misjudgments by adding small disturbance information. Which can be used to detect the robustness of text classification models. At present, the methods of sample generation in the Chinese domain include mainly traditional characters and homophones substitution, which have the problems of large disturbance amplitude of sample generation and low quality of sample generation. Polyphonic characters Generation Adversarial Sample (PGAS), a character-level countermeasure samples generation approach, is proposed in this paper. Which can generate high-quality adversarial samples with minor disturbance by replacing polyphonic characters. First, a polyphonic word dictionary to label polyphonic words is constructed. Then, the input text with polyphonic words is replaced. Finally, an adversarial sample attack experiment in the black-box model is conducted. Experiments on multiple sentiment classification datasets verify the effectiveness of the proposed method for a variety of the latest classification models.
  • loading
  • [1]
    PAPERNOT N, MCDANIEL P, SWAMI A, et al. Crafting adversarial input sequences for recurrent neural networks[C]. MILCOM 2016 - 2016 IEEE Military Communications Conference, Baltimore, USA, 2016: 49–54.
    [2]
    WANG Boxin, PEI Hengzhi, PAN Boyuan, et al. T3: Tree-autoencoder constrained adversarial text generation for targeted attack[C/OL]. The 2020 Conference on Empirical Methods in Natural Language Processing, 2020: 6134–6150.
    [3]
    LE T, WANG Suhang, and LEE D. MALCOM: Generating malicious comments to attack neural fake news detection models[C]. 2020 IEEE International Conference on Data Mining, Sorrento, Italy, 2020: 282–291.
    [4]
    MOZES M, STENETORP P, KLEINBERG B, et al. Frequency-guided word substitutions for detecting textual adversarial examples[C/OL]. The 16th Conference of the European Chapter of the Association for Computational Linguistics, 2021: 171–186.
    [5]
    TAN S, JOTY S, VARSHNEY L, et al. Mind your Inflections! Improving NLP for non-standard Englishes with Base-Inflection encoding[C/OL]. The 2020 Conference on Empirical Methods in Natural Language Processing, 2020: 5647–5663.
    [6]
    潘文雯, 王新宇, 宋明黎, 等. 对抗样本生成技术综述[J]. 软件学报, 2020, 31(1): 67–81. doi: 10.13328/j.cnki.jos.005884

    PAN Wenwen, WANG Xinyu, SONG Mingli, et al. Survey on generating adversarial examples[J]. Journal of Software, 2020, 31(1): 67–81. doi: 10.13328/j.cnki.jos.005884
    [7]
    MILLER D, NICHOLSON L, DAYOUB F, et al. Dropout sampling for robust object detection in open-set conditions[C]. 2018 IEEE International Conference on Robotics and Automation, Brisbane, Australia, 2018: 3243–3249.
    [8]
    王文琦, 汪润, 王丽娜, 等. 面向中文文本倾向性分类的对抗样本生成方法[J]. 软件学报, 2019, 30(8): 2415–2427. doi: 10.13328/j.cnki.jos.005765

    WANG Wenqi, WANG Run, WANG Li’na, et al. Adversarial examples generation approach for tendency classification on Chinese texts[J]. Journal of Software, 2019, 30(8): 2415–2427. doi: 10.13328/j.cnki.jos.005765
    [9]
    仝鑫, 王罗娜, 王润正, 等. 面向中文文本分类的词级对抗样本生成方法[J]. 信息网络安全, 2020, 20(9): 12–16. doi: 10.3969/j.issn.1671-1122.2020.09.003

    TONG Xin, WANG Luona, WANG Runzheng, et al. A generation method of word-level adversarial samples for Chinese text classiifcation[J]. Netinfo Security, 2020, 20(9): 12–16. doi: 10.3969/j.issn.1671-1122.2020.09.003
    [10]
    BLOHM M, JAGFELD G, SOOD E, et al. Comparing attention-based convolutional and recurrent neural networks: Success and limitations in machine reading comprehension[C]. The 22nd Conference on Computational Natural Language Learning, Brussels, Belgium, 2018: 108–118.
    [11]
    NIU Tong and BANSAL M. Adversarial over-sensitivity and over-stability strategies for dialogue models[C]. The 22nd Conference on Computational Natural Language Learning, Brussels, Belgium, 2018: 486–496.
    [12]
    EBRAHIMI J, LOWD D, and DOU Dejing. On adversarial examples for character-level neural machine translation[C]. The 27th International Conference on Computational Linguistics, Santa Fe, USA, 2018: 653–663.
    [13]
    GAO Ji, LANCHANTIN J, SOFFA M L, et al. Black-box generation of adversarial text sequences to evade deep learning classifiers[C]. 2018 IEEE Security and Privacy Workshops, San Francisco, USA, 2018: 50–56.
    [14]
    GOODMAN D, LV Zhonghou, and WANG Minghua. FastWordBug: A fast method to generate adversarial text against NLP applications[J]. arXiv preprint arXiv: 2002.00760, 2020.
    [15]
    EBRAHIMI J, RAO Anyi, LOWD D, et al. HotFlip: White-box adversarial examples for text classification[C]. The 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 2018: 31–36.
    [16]
    SONG Liwei, YU Xinwei, PENG H T, et al. Universal adversarial attacks with natural triggers for text classification[C/OL]. The 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021: 3724–3733.
    [17]
    LI Dianqi, ZHANG Yizhe, PENG Hao, et al. Contextualized perturbation for textual adversarial attack[C/OL]. The 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021: 5053–5069.
    [18]
    TAN S, JOTY S, KAN M Y, et al. It's Morphin' time! Combating linguistic discrimination with inflectional perturbations[C/OL]. The 58th Annual Meeting of the Association for Computational Linguistics, 2020: 2920–2935.
    [19]
    LI Linyang, MA Ruotian, GUO Qipeng, et al. BERT-ATTACK: Adversarial attack against BERT using BERT[C/OL]. The 2020 Conference on Empirical Methods in Natural Language Processing, 2020: 6193–6202.
    [20]
    ZANG Yuan, QI Fanchao, YANG Chenghao, et al. Word-level textual adversarial attacking as combinatorial optimization[C/OL]. The 58th Annual Meeting of the Association for Computational Linguistics, 2020: 6066–6080.
    [21]
    CHENG Minhao, YI Jinfeng, CHEN Pinyu, et al. Seq2Sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples[C]. The 34th AAAI Conference on Artificial Intelligence, New York, USA, 2020: 3601–3608.
    [22]
    JIA R and LIANG P. Adversarial examples for evaluating reading comprehension systems[C]. The 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 2017: 2021–2031.
    [23]
    MINERVINI P and RIEDEL S. Adversarially regularising neural NLI models to integrate logical background knowledge[C]. The 22nd Conference on Computational Natural Language Learning, Brussels, Belgium, 2018: 65–74.
    [24]
    WANG Yicheng and BANSAL M. Robust machine comprehension models via adversarial training[C]. The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, USA, 2018: 575–581.
    [25]
    RIBEIRO M T, SINGH S, and GUESTRIN C. Semantically equivalent adversarial rules for debugging NLP models[C]. The 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 2018: 856–865.
    [26]
    IYYER M, WIETING J, GIMPEL K, et al. Adversarial example generation with syntactically controlled paraphrase networks[C]. The 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, USA, 2018: 1875–1885.
    [27]
    HAN Wenjuan, ZHANG Liwen, JIANG Yong, et al. Adversarial attack and defense of structured prediction models[C/OL]. The 2020 Conference on Empirical Methods in Natural Language Processing, 2020: 2327–2338.
    [28]
    WANG Tianlu, WANG Xuezhi, QIN Yao, et al. CAT-Gen: Improving robustness in NLP models via controlled adversarial text generation[C/OL]. The 2020 Conference on Empirical Methods in Natural Language Processing, 2020: 5141–5146.
    [29]
    魏星, 王小辉, 魏亮, 等. 基于规范科技术语数据库的科技术语多音字研究与读音推荐[J]. 中国科技术语, 2020, 22(6): 25–29. doi: 10.3969/j.issn.1673-8578.2020.06.005

    WEI Xing, WANG Xiaohui, WEI Liang, et al. Pronunciation recommendations on polyphonic characters in terms based on the database of standardized terms[J]. China Terminology, 2020, 22(6): 25–29. doi: 10.3969/j.issn.1673-8578.2020.06.005
    [30]
    KIRITCHENKO S, ZHU Xiaodan, CHERRY C, et al. NRC-Canada-2014: Detecting aspects and sentiment in customer reviews[C]. The 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, 2014: 437–442.
    [31]
    TANG Duyu, QIN Bing, FENG Xiaocheng, et al. Effective LSTMs for target-dependent sentiment classification[C]. COLING 2016, the 26th International Conference on Computational Linguistics, Osaka, Japan, 2016: 3298–3307.
    [32]
    TANG Duyu, QIN Bing, and LIU Ting. Aspect level sentiment classification with deep memory network[C]. The 2016 Conference on Empirical Methods in Natural Language Processing, Austin, USA, 2016: 214–224.
    [33]
    MA Dehong, LI Sujian, ZHANG Xiaodong, et al. Interactive attention networks for aspect-level sentiment classification[C]. The 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 2017: 4068–4074.
    [34]
    HUANG Binxuan, OU Yanglan, and CARLEY K M. Aspect level sentiment classification with attention-over-attention neural networks[C]. The 11th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation, Washington, USA, 2018: 197–206.
    [35]
    SONG Youwei, WANG Jiahai, JIANG Tao, et al. Targeted sentiment classification with attentional encoder network[C]. The 28th International Conference on Artificial Neural Networks, Munich, Germany, 2019: 93–103.
    [36]
    HE Ruidan, LEE W S, NG H T, et al. Effective attention modeling for aspect-level sentiment classification[C]. The 27th International Conference on Computational Linguistics, Santa Fe, USA, 2018: 1121–1131.
    [37]
    HUANG Binxuan and CARLEY K M. Syntax-aware aspect level sentiment classification with graph attention networks[C]. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019: 5469–5477.
    [38]
    ZHANG Chen, LI Qiuchi, and SONG Dawei. Aspect-based sentiment classification with aspect-specific graph convolutional networks[C]. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019: 4568–4578.
    [39]
    WANG Yuanchao, LI Mingtao, PAN Zhichen, et al. Pulsar candidate classification with deep convolutional neural networks[J]. Research in Astronomy and Astrophysics, 2019, 19(9): 133. doi: 10.1088/1674-4527/19/9/133
    [40]
    唐恒亮, 尹棋正, 常亮亮, 等. 基于混合图神经网络的方面级情感分类[J]. 计算机工程与应用, 2023, 59(4): 175–182. doi: 10.3778/j.ssn.1002-8331.2109-0172

    TANG Hengliang, YIN Qizheng, CHANG Liangliang, et al. Aspect-level sentiment classification based on mixed graph neural network[J]. Computer Engineering and Applications, 2023, 59(4): 175–182. doi: 10.3778/j.ssn.1002-8331.2109-0172
    [41]
    KUSNER M J, SUN Yu, KOLKIN N I, et al. From word embeddings to document distances[C]. The 32nd International Conference on Machine Learning, Lille, France, 2015: 957–966.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)  / Tables(5)

    Article Metrics

    Article views (787) PDF downloads(155) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return