高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于忆阻循环神经网络的层次化状态正则变分自编码器

胡小方 杨涛

胡小方, 杨涛. 基于忆阻循环神经网络的层次化状态正则变分自编码器[J]. 电子与信息学报, 2023, 45(2): 689-697. doi: 10.11999/JEIT211431
引用本文: 胡小方, 杨涛. 基于忆阻循环神经网络的层次化状态正则变分自编码器[J]. 电子与信息学报, 2023, 45(2): 689-697. doi: 10.11999/JEIT211431
HU Xiaofang, YANG Tao. Hierarchical State Regularization Variational AutoEncoder Based on Memristor Recurrent Neural Network[J]. Journal of Electronics & Information Technology, 2023, 45(2): 689-697. doi: 10.11999/JEIT211431
Citation: HU Xiaofang, YANG Tao. Hierarchical State Regularization Variational AutoEncoder Based on Memristor Recurrent Neural Network[J]. Journal of Electronics & Information Technology, 2023, 45(2): 689-697. doi: 10.11999/JEIT211431

基于忆阻循环神经网络的层次化状态正则变分自编码器

doi: 10.11999/JEIT211431 cstr: 32379.14.JEIT211431
基金项目: 国家自然科学基金(61976246),重庆市自然科学基金(cstc2020jcyj-msxmX0385)
详细信息
    作者简介:

    胡小方:女,教授,研究方向为忆阻器、神经网络、机器学习、非线性系统与电路

    杨涛:男,硕士生,研究方向为自然语言处理、忆阻器

    通讯作者:

    胡小方 huxf@swu.edu.cn

  • 中图分类号: TN918.3; TN601

Hierarchical State Regularization Variational AutoEncoder Based on Memristor Recurrent Neural Network

Funds: The National Natural Science Foundation of China (61976246), The Natural Science Foundation of Chongqing (cstc2020jcyj-msxmX0385)
  • 摘要: 变分自编码器(VAE)作为一个功能强大的文本生成模型受到越来越多的关注。然而,变分自编码器在优化过程中容易出现后验崩溃,即忽略潜在变量,退化为一个自编码器。针对这个问题,该文提出一种新的变分自编码器模型,通过层次化编码和状态正则方法,可以有效缓解后验崩溃,且相较于基线模型具有更优的文本生成质量。在此基础上,基于纳米级忆阻器,将提出的变分自编码器模型与忆阻循环神经网络(RNN)结合,设计一种基于忆阻循环神经网络的硬件实现方案,即层次化变分自编码忆组神经网络(HVAE-MNN),探讨模型的硬件加速。计算机仿真实验和结果分析验证了该文模型的有效性与优越性。
  • 图  1  HSR-VAE模型结构图

    图  2  忆阻LSTM

    表  1  数据集

    数据集训练集验证集测试集词表(k)
    PTB42068337037619.95
    Yelp100000100001000019.76
    Yahoo100000100001000019.73
    Dailydialog111181000100022
    下载: 导出CSV

    表  2  语言模型实验对比

    模型PTBYahoo
    NLL↓PPL↓KLNLL↓PPL↓KL
    VAE-LSTM101.2101.40.0328.661.20.0
    SA-VAE101.0100.71.3327.260.25.2
    Cyc-VAE102.8109.01.4330.665.32.1
    Lag_VAE100.999.87.2326.759.85.7
    BN-VAE100.296.97.2327.460.28.8
    Sri-VAE101.294.210.1327.357.016.1
    TWR-VAE86.640.95.0317.350.23.3
    本文79.430.29.1290.735.88.7
    下载: 导出CSV

    表  3  语言模型生成文本示例

    模型原始文本生成文本
    TWR-VAE(1) it 's totally different
    (2) sec proposals may n
    (3) the test may come today
    (1) it 's very ok
    (2) terms officials may n
    (3) the naczelnik may be
    本文(1) merrill lynch ready assets trust
    (2) all that now has changed
    (3) now it 's happening again
    (1) merrill lynch ready assets trust
    (2) what that now has changed
    (3) now it 's quite again
    下载: 导出CSV

    表  4  消融研究实验对比

    模型YelpYahoo
    NLL↓PPL↓MI↑KLNLL↓PPL↓MI↑KL
    TWR-VAE_RNN395.456.43.90.5363.088.24.10.6
    TWR-VAE_GRU360.939.74.23.3336.963.94.23.7
    TWR-VAE_ LSTM344.333.54.13.1317.350.24.13.3
    本文RNN + RNN400.957.32.61.3366.390.83.22.4
    RNN + LSTM340.331.33.75.1303.241.73.36.0
    RNN + GRU358.637.43.63.1326.556.24.93.9
    LSTM + RNN349.734.23.36.3310.846.73.46.7
    LSTM + LSTM340.431.13.710.2310.445.63.58.8
    LSTM + GRU341.531.43.37.0295.738.13.57.9
    GRU + RNN349.834.24.610.6320.852.84.811.2
    GRU + LSTM342.531.73.410.7293.737.13.511.1
    GRU + GRU336.629.93.57.3290.735.83.48.7
    下载: 导出CSV

    表  5  对话响应生成任务实验对比

    模型BLEUBOWIntra-distInter-dist
    R↑P↑F1↑A↑E↑G↑dist-1↑dist-2↑dist-1↑dist-2↑
    SeqGAN0.2700.2700.2700.9180.4950.7740.7470.8060.0750.081
    VHRED0.2710.2600.2650.8920.5070.7860.6330.7710.0710.089
    VHCR0.2890.2660.2770.9250.5250.7980.7680.8140.1050.129
    CVAE0.2650.2220.2420.9230.5430.8110.9380.9730.1770.222
    WAE0.3410.2780.3060.9480.5780.8460.8300.9400.3270.583
    iVAE0.3550.2390.2850.9510.6090.8720.8970.9750.5010.868
    TWR-VAE0.4070.2810.3330.9520.6030.8650.9210.9900.4970.817
    本文0.3770.2680.3130.9190.5830.8590.9770.9690.7010.915
    下载: 导出CSV

    表  6  对话响应生成文本示例

    TWR-VAE本文
    对话1 问题: come in, come in, what can i do for you ?期望响应: professor donald, are you giving your advanced geology course again next semester?
    (1) i would like to have a look at your computer.
    (2) hello, i'd like to buy a ticket to shanghai.
    (3) hello, i'm looking for a job.
    (1) who have been working.
    (2) and a lot of coffee.
    (3) some advice of humor.
    对话2 问题: does the bathroom have a bath or a shower? 期望响应: all our bathrooms have full size <unk> with a shower attachment.
    (1) yes, sir. everything is available. and we will charge it.
    (2) perhaps i can use a credit card to the bank. what would you like?
    (3) sure. does that mean will be there in your room?
    (1) sure, we have two. the room is <unk> and <unk>. it's $ 60 for a while.
    (2) that's a reserved room, but it has a maximum account with us today.
    (3) yes, that is the fee.
    下载: 导出CSV
  • [1] KINGMA D P and WELLING M. Auto-encoding variational bayes[C]. The 2nd International Conference on Learning Representations, Banff, Canada, 2014.
    [2] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139–144. doi: 10.1145/3422622
    [3] VAN DEN OORD A, LI Yazhe, and VINYALS O. Representation learning with contrastive predictive coding[J]. arXiv: 1807.03748, 2018.
    [4] RAZAVI A, VAN DEN OORD A, and VINYALS O. Generating diverse high-fidelity images with VQ-VAE-2[C]. The 33rd Conference on Neural Information Processing Systems, Vancouver, Canada, 2019.
    [5] LI Xiao, LIN Chenghua, LI Ruizhe, et al. Latent space factorisation and manipulation via matrix subspace projection[C/OL]. The 37th International Conference on Machine Learning, 2020.
    [6] LI Ruizhe, LI Xiao, LIN Chenghua, et al. A stable variational autoencoder for text modelling[C]. The 12th International Conference on Natural Language Generation, Tokyo, Japan, 2019.
    [7] FANG Le, LI Chunyuan, GAO Jianfeng, et al. Implicit deep latent variable models for text generation[C]. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, 2019.
    [8] GU Xiaodong, CHO K, HA J W, et al. DialogWAE: Multimodal response generation with conditional wasserstein auto-encoder[C]. The 7th International Conference on Learning Representations, New Orleans, USA, 2019.
    [9] JOHN V, MOU Lili, BAHULEYAN H, et al. Disentangled representation learning for non-parallel text style transfer[C]. The 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019.
    [10] BOWMAN S R, VILNIS L, VINYALS O, et al. Generating sentences from a continuous space[C]. The 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany, 2016: 10–21.
    [11] YANG Zichao, HU Zhiting, SALAKHUTDINOV R, et al. Improved variational autoencoders for text modeling using dilated convolutions[C]. The 34th International Conference on Machine Learning, Sydney, Australia, 2017: 3881–3890.
    [12] XU Jiacheng and DURRETT G. Spherical latent spaces for stable variational autoencoders[C]. The 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018: 4503–4513.
    [13] SHEN Dinghan, CELIKYILMAZ A, ZHANG Yizhe, et al. Towards generating long and coherent text with multi-level latent variable models[C]. The 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019: 2079–2089.
    [14] HAO Fu, LI Chunyuan, LIU Xiaodong, et al. Cyclical annealing schedule: A simple approach to mitigating KL vanishing[C]. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA, 2019: 240–250.
    [15] HE Junxian, SPOKOYNY D, NEUBIG G, et al. Lagging inference networks and posterior collapse in variational autoencoders[C]. The 7th International Conference on Learning Representations, New Orleans, USA, 2019.
    [16] ZHU Qile, BI Wei, LIU Xiaojiang, et al. A batch normalized inference network keeps the KL vanishing away[C/OL]. The 58th Annual Meeting of the Association for Computational Linguistics, 2020: 2636–2649.
    [17] LI Ruizhe, LI Xiao, CHEN Guanyi, et al. Improving variational autoencoder for text modelling with timestep-wise regularisation[C]. The 28th International Conference on Computational Linguistics, Barcelona, Spain, 2020: 2381–2397.
    [18] PANG Bo, NIJKAMP E, HAN Tian, et al. Generative text modeling through short run inference[C/OL]. The 16th Conference of the European Chapter of the Association for Computational Linguistics, 2021: 1156–1165.
    [19] SILVA F, SANZ M, SEIXAS J, et al. Perceptrons from memristors[J]. Neural Networks, 2020, 122: 273–278. doi: 10.1016/j.neunet.2019.10.013
    [20] LIU Jiaqi, LI Zhenghao, TANG Yongliang, et al. 3D Convolutional Neural Network based on memristor for video recognition[J]. Pattern Recognition Letters, 2020, 130: 116–124. doi: 10.1016/j.patrec.2018.12.005
    [21] WEN Shiping, WEI Huaqiang, YANG Yin, et al. Memristive LSTM network for sentiment analysis[J]. IEEE Transactions on Systems, Man, and Cybernetics:Systems, 2021, 51(3): 1794–1804. doi: 10.1109/TSMC.2019.2906098
    [22] ADAM K, SMAGULOVA K, and JAMES A P. Memristive LSTM network hardware architecture for time-series predictive modeling problems[C]. 2018 IEEE Asia Pacific Conference on Circuits and Systems, Chengdu, China, 2018: 459–462.
    [23] GOKMEN T, RASCH M J, and HAENSCH W. Training LSTM networks with resistive cross-point devices[J]. Frontiers in Neuroscience, 2018, 12: 745. doi: 10.3389/fnins.2018.00745
    [24] LI Can, WANG Zhongrui, RAO Mingyi, et al. Long short-term memory networks in memristor crossbar arrays[J]. Nature Machine Intelligence, 2019, 1(1): 49–57. doi: 10.1038/s42256-018-0001-4
    [25] LIU Xiaoyang, ZENG Zhigang, and WUNSCH II D C. Memristor-based LSTM network with in situ training and its applications[J]. Neural Networks, 2020, 131: 300–311. doi: 10.1016/j.neunet.2020.07.035
    [26] PARK Y, CHO J, and KIM G. A hierarchical latent structure for variational conversation modeling[C]. 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, USA, 2018: 1792–1801.
    [27] CHUA L. Memristor-the missing circuit element[J]. IEEE Transactions on Circuit Theory, 1971, 18(5): 507–519. doi: 10.1109/TCT.1971.1083337
    [28] STRUKOV D B, SNIDER G S, STEWART D R, et al. The missing memristor found[J]. Nature, 2008, 453(7191): 80–83. doi: 10.1038/nature06932
    [29] KIM Y, WISEMAN S, MILLER A C, et al. Semi-amortized variational autoencoders[C]. The 35 th International Conference on Machine Learning, Stockholm, Sweden, 2018: 2678–2687.
    [30] ZHAO Tiancheng, ZHAO Ran, and ESKENAZI M. Learning discourse-level diversity for neural dialog models using conditional variational autoencoders[C]. The 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017: 654–664.
    [31] LI Yanran, SU Hui, SHEN Xiaoyu, et al. DailyDialog: A manually labelled multi-turn dialogue dataset[C]. The Eighth International Joint Conference on Natural Language Processing, Taipei, China, 2017: 986–995.
    [32] KHEMAKHEM I, KINGMA D P, MONTI R P, et al. Variational autoencoders and nonlinear ICA: A unifying framework[C]. The Twenty Third International Conference on Artificial Intelligence and Statistics, Palermo, Italy, 2020.
    [33] SERBAN I V, SORDONI A, LOWE R, et al. A hierarchical latent variable encoder-decoder model for generating dialogues[C]. The 31st AAAI Conference on Artificial Intelligence, San Francisco, USA, 2017.
    [34] YU Lantao, ZHANG Weinan, WANG Jun, et al. SeqGAN: Sequence generative adversarial nets with policy gradient[C]. The Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, USA, 2017: 2852–2858.
  • 加载中
图(2) / 表(6)
计量
  • 文章访问数:  852
  • HTML全文浏览量:  650
  • PDF下载量:  142
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-12-06
  • 修回日期:  2022-03-03
  • 录用日期:  2022-03-08
  • 网络出版日期:  2022-03-12
  • 刊出日期:  2023-02-07

目录

    /

    返回文章
    返回