Design of Lightweight Gated Recurrent Unit Network Model Based on Memristor
-
摘要: 忆阻器门控循环单元 (Gated Recurrent Unit, GRU) 网络对于时序数据处理系统的嵌入式部署提供了新的解决途径,但是由于网络规模大、权值精度高,难以直接部署到嵌入式端侧设备。因此,本文开展了忆阻器轻量化GRU网络模型设计研究,构建了能够部署在有限资源上的GRU网络模型,设计了忆阻器交叉阵列的映射方式,提出了基于性能分析与器件感知的融合量化方法,综合考虑网络性能与权值部署、激活函数计算的不同器件实现方式,使用权值对称量化、激活值非对称量化的策略对忆阻器GRU网络模型进行量化,采用权值加噪的方式提升网络模型对忆阻器件非理想因素的包容性。仿真实验表明,本文所设计的忆阻器GRU网络模型在公开的UrbanSound8K数据集上的分类准确率为93.94%,量化至6 bit后模型分类准确率为92.68%,相比于全精度的Dilated Convolution、LM-MFCC+GRU、TFFS-DNN模型分别高出14.68%、0.68%、3.94%,且权值加噪训练能够有效提升轻量化网络模型对忆阻器件非理想因素的适应能力。此外,还验证了该网络模型在真假轨迹判别任务上的性能,在自建的真假轨迹数据集上的分类准确率为97.35%,量化至6 bit后分类准确率仅下降0.84%。Abstract:
Objective With the slowdown of CMOS technology scaling and the inherent memory-computation separation in von Neumann architectures, traditional computing systems face critical bottlenecks in processing increasingly large-scale data. Memristors, which offer high integration density, fast switching speed, and inherent synaptic plasticity, provide a promising pathway to overcome these limitations. Their crossbar arrays naturally support vector-matrix multiplication in the analog domain, enabling energy-efficient in-memory computing. Among sequential data processing models, the Gated Recurrent Unit (GRU) network has emerged as a key recurrent neural network variant, demonstrating superior performance in time-series tasks such as trajectory prediction and audio recognition. However, conventional hardware implementations of GRU networks suffer from frequent data movement between memory and processing units, leading to high energy consumption and low throughput. Although memristor-based GRU implementations offer significant advantages in energy efficiency and computational parallelism, the large parameter size and high weight precision of GRU networks impose substantial hardware costs and reliability challenges when deployed on resource-constrained memristor arrays. Furthermore, device non-idealities, including conductance fluctuations and nonlinear modulation, can substantially degrade model accuracy. Existing memristor-GRU solutions lack comprehensive consideration of these device imperfections, and current quantization methods treat weights and activations uniformly without accounting for their distinct hardware implementation constraints. This paper addresses these challenges through a hardware-algorithm co-design approach. Methods This paper proposes a lightweight memristor-based GRU network model. A 1T1R (one-transistor-one-resistor) memristor crossbar array is adopted for weight mapping and analog multiply-accumulate (MAC) operations. To accommodate signed weights while memristor conductance values are strictly non-negative, each weight is mapped to a differential pair of positive and negative conductance matrices. The linear mapping between trained weights and memristor conductance values is defined through a transformation formula involving scaling and offset factors. To address the distinct hardware implementation requirements of weights and activations, a fusion quantization method based on performance analysis and device awareness is introduced. Specifically, symmetric quantization is applied to weights mapped to the memristor array, as the zero-centered quantization range simplifies write-driver circuit design by eliminating the need for zero-point storage and computation. In contrast, asymmetric quantization is employed for activation values, which are computed in peripheral circuits without involving online memristor state programming, thereby preserving the dynamic range and minimizing quantization error. To mitigate the impact of memristor conductance fluctuations, a weight noising training mechanism is incorporated into quantization-aware training (QAT). Gaussian noise, with intensity determined by the device variation parameter, is injected into quantized weights during each forward propagation. This approach acts as a strong regularizer, guiding the model to converge to flatter loss landscapes and learn robust features insensitive to weight perturbations. The straight-through estimator is used for gradient backpropagation, enabling updates to full-precision floating-point weights while noise is dynamically sampled in each forward pass. Results and Discussions On the public UrbanSound8K dataset for urban sound classification, the full-precision proposed model achieves 93.94% classification accuracy. After applying the fusion quantization method, the 6-bit quantized model maintains 92.68% accuracy, with only a 1.26% degradation despite an 81.25% reduction in weight precision ( Table 1 ). This performance surpasses that of comparison models including Dilated Convolution (78.00%), LM-MFCC+GRU (92.00%), TFFS-DNN (88.74%), TFCNN (93.10%), and CL-Transformer (92.95%) at their full-precision settings (Table 2 ). When evaluated under noisy input conditions with signal-to-noise ratios ranging from -10 dB to 10 dB, the 6-bit quantized model exhibits comparable or superior robustness compared to its full-precision counterpart, demonstrating the effectiveness of the fusion quantization approach (Table 3 ). Analysis from storage, hardware, and device feasibility perspectives indicates that 6-bit quantization reduces weight storage from 5.6 MB to 1.05 MB, achieving an 81.2% compression rate, with only 2.8 million memristor cells required based on the 1T1R mapping scheme. Regarding robustness to device non-idealities, weight noising training significantly improves performance under conductance fluctuations (Fig. 7 ). When the device variation range reaches 14%, noising training improves accuracy from 82.97% to 91.14%; at the worst-case variation of 28%, it improves accuracy from 54.23% to 87.01% (Fig. 7 ), confirming that the proposed training strategy effectively enhances model adaptability to memristor imperfections. On a self-constructed true/false trajectory dataset, the model achieves 97.35% accuracy at full precision and 96.51% at 6-bit quantization, with only a 0.84% degradation, outperforming the Dilated Convolution baseline at equivalent quantization levels (Table 4 ). Furthermore, to demonstrate generalization across diverse sequential tasks, the model is evaluated on lithium-ion battery state-of-charge (SOC) estimation using a public dataset. The 6-bit quantized model achieves root mean square errors (RMSE) of 1.48%, 0.79%, and 0.74% at temperatures of 0°C, 25°C, and 45°C, respectively, outperforming the existing memristor-based GRU implementation and showing consistent superiority across all evaluated quantization bit-widths of 6 bits and above (Table 5 ).Conclusions This paper presents a lightweight GRU network model tailored for memristor-based hardware deployment. Through device-aware fusion quantization and weight noising training integrated with QAT, the model maintains high classification performance while achieving substantial memory compression and robustness to device non-idealities. Experimental results across multiple datasets and tasks confirm that the 6-bit quantized model retains competitive accuracy and demonstrates stable performance, providing a practical solution for deploying GRU networks on resource-constrained memristor-based edge computing platforms. -
表 1 面向城市音频分类任务的轻量化GRU网络模型分类性能
量化精度 (bit) 分类准确率 (%) 2 12.01 3 36.96 4 51.26 5 78.15 6 92.68 7 93.59 8 93.71 16 93.82 表 2 与其他模型在UrbanSound8K数据集上的性能对比
表 3 在UrbanSound8K数据集上加入不同SNR水平噪声时GRU网络模型的分类性能
SNR (dB) 分类准确率 (%) 全精度模型 6 bit量化模型 -10 34.44 38.79 -5 69.91 70.25 0 79.52 85.24 5 84.04 87.30 10 92.11 90.96 表 4 面向真假轨迹判别任务的轻量化GRU网络模型和Dilated Convolution[15]模型的分类性能
量化精度 (bit) 分类准确率 (%) Ours Dilated Convolution[15] 2 63.44 65.53 3 63.72 75.53 4 68.28 90.14 5 91.40 90.65 6 96.51 90.56 7 97.16 90.65 8 97.26 90.60 16 97.30 90.98 表 5 不同模型对各种环境温度下FUDS的SOC估计性能RMSE (%)
模型 环境温度 (℃) 0 25 45 Memristor-based GRU[20] 2.18 1.36 1.23 Ours (full precision) 1.26 0.58 0.56 Ours (16 bit) 1.57 0.62 0.52 Ours (8 bit) 1.39 0.58 0.52 Ours (7 bit) 1.55 0.73 0.75 Ours (6 bit) 1.48 0.79 0.74 Ours (5 bit) 2.64 1.86 1.76 Ours (4 bit) 13.33 10.50 11.75 Ours (3 bit) 22.62 23.37 24.29 Ours (2 bit) 22.24 22.81 23.82 -
[1] STRUKOV D B, SNIDER G S, STEWART D R, et al. The missing memristor found[J]. Nature, 2008, 453(7191): 80–83. doi: 10.1038/nature06932. [2] LECUN Y, BENGIO Y, and HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436–444. doi: 10.1038/nature14539. [3] 刘凇佐, 王虔, 李磊, 等. 粒子群优化的门控循环单元网络漂流浮标轨迹预测[J]. 电子与信息学报, 2024, 46(8): 3295–3304. doi: 10.11999/JEIT230945.LIU Songzuo, WANG Qian, LI Lei, et al. Gated recurrent unit network of particle swarm optimization for drifting buoy trajectory prediction[J]. Journal of Electronics & Information Technology, 2024, 46(8): 3295–3304. doi: 10.11999/JEIT230945. [4] WANG Jiayang, JI Xiaoyue, DONG Zhekang, et al. Circuit design of memristor-based GRU and its applications in SOC estimation[C]. 2023 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, USA, 2023: 1–5. doi: 10.1109/ICCE56470.2023.10043585. [5] TONG Peiwen, XU Hui, SUN Yi, et al. Lightweight and highly robust memristor-based hybrid neural networks for electroencephalogram signal processing[J]. Chinese Physics B, 2023, 32(7): 078505. doi: 10.1088/1674-1056/ac9cbc. [6] 李源堃, 王泽, 张清天, 等. NAS4CIM: 面向忆阻器存算一体芯片的神经网络结构搜索框架[J]. 电子与信息学报, 2025, 47(12): 4948–4958. doi: 10.11999/JEIT250978.LI Yuankun, WANG Ze, ZHANG Qingtian, et al. NAS4CIM: Tailored neural network architecture search for RRAM-based compute-in-memory chips[J]. Journal of Electronics & Information Technology, 2025, 47(12): 4948–4958. doi: 10.11999/JEIT250978. [7] 蔺海荣, 段晨星, 邓晓衡, 等. 双忆阻类脑混沌神经网络及其在IoMT数据隐私保护中应用[J]. 电子与信息学报, 2025, 47(7): 2194–2210. doi: 10.11999/JEIT241133.LIN Hairong, DUAN Chenxing, DENG Xiaoheng, et al. Dual-memristor brain-like chaotic neural network and its application in IoMT data privacy protection[J]. Journal of Electronics & Information Technology, 2025, 47(7): 2194–2210. doi: 10.11999/JEIT241133. [8] BALASKAS K, KARATZAS A, SAD C, et al. Hardware-aware DNN compression via diverse pruning and mixed-precision quantization[J]. IEEE Transactions on Emerging Topics in Computing, 2024, 12(4): 1079–1092. doi: 10.1109/TETC.2023.3346944. [9] PERRIN M, GUICQUERO W, PAILLE B, et al. Hardware-aware Bayesian neural architecture search of quantized CNNs[J]. IEEE Embedded Systems Letters, 2025, 17(1): 42–45. doi: 10.1109/LES.2024.3434379. [10] CHEN Junren, WU Huaqiang, GAO Bin, et al. Optimization strategy for accelerating multi-bit resistive weight programming on the RRAM array[C]. 2019 IEEE International Workshop on Future Computing (IWOFC), Hangzhou, China, 2019: 1–3. doi: 10.1109/IWOFC48002.2019.9078447. [11] HONG Haiqiao, DU Zhiyuan, JIANG Mingrui, et al. Memristor-based adaptive analog-to-digital conversion for efficient and accurate compute-in-memory[J]. Nature Communications, 2025, 16(1): 9749. doi: 10.1038/s41467-025-65233-w. [12] LI Can, WANG Zhongrui, RAO Mingyi, et al. Long short-term memory networks in memristor crossbar arrays[J]. Nature Machine Intelligence, 2019, 1(1): 49–57. doi: 10.1038/s42256-018-0001-4. [13] HUANG Lixing, YU Hongqi, CHEN Changlin, et al. A training strategy for improving the robustness of memristor-based binarized convolutional neural networks[J]. Semiconductor Science and Technology, 2022, 37(1): 015013. doi: 10.1088/1361-6641/ac31e3. [14] SUN Yi, XU Hui, WANG Chao, et al. A Ti/AlOx/TaOx/Pt analog synapse for memristive neural network[J]. IEEE Electron Device Letters, 2018, 39(9): 1298–1301. doi: 10.1109/LED.2018.2860053. [15] CHEN Yan, GUO Qian, LIANG Xinyan, et al. Environmental sound classification with dilated convolutions[J]. Applied Acoustics, 2019, 148: 123–132. doi: 10.1016/j.apacoust.2018.12.019. [16] PENG Ning, CHEN Aibin, ZHOU Guoxiong, et al. Environment sound classification based on visual multi-feature fusion and GRU-AWS[J]. IEEE Access, 2020, 8: 191100–191114. doi: 10.1109/ACCESS.2020.3032226. [17] MU Wenjie, YIN Bo, HUANG Xianqing, et al. Environmental sound classification using temporal-frequency attention based convolutional neural network[J]. Scientific Reports, 2021, 11(1): 21552. doi: 10.1038/s41598-021-01045-4. [18] WU Bo and ZHANG Xiaoping. Environmental sound classification via time–frequency attention and framewise self-attention-based deep neural networks[J]. IEEE Internet of Things Journal, 2022, 9(5): 3416–3428. doi: 10.1109/JIOT.2021.3098464. [19] CHEN Xu, WANG Mei, KAN Ruixiang, et al. Improved patch-mix transformer and contrastive learning method for sound classification in noisy environments[J]. Applied Sciences, 2024, 14(21): 9711. doi: 10.3390/app14219711. [20] CHEN Yanan, LUO Wei, CARTER M, et al. Organic electrode for non-aqueous potassium-ion batteries[J]. Nano Energy, 2015, 18: 205–211. doi: 10.1016/j.nanoen.2015.10.015. -
下载: