Advanced Search
Turn off MathJax
Article Contents
LIU Jia, ZHANG Yangrui, CHEN Dapeng, MAO Die, LU Guorui. Bimodal Emotion Recognition Method based on Dual-stream Attention and Confrontation Mutual Reconstruction[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250424
Citation: LIU Jia, ZHANG Yangrui, CHEN Dapeng, MAO Die, LU Guorui. Bimodal Emotion Recognition Method based on Dual-stream Attention and Confrontation Mutual Reconstruction[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250424

Bimodal Emotion Recognition Method based on Dual-stream Attention and Confrontation Mutual Reconstruction

doi: 10.11999/JEIT250424 cstr: 32379.14.JEIT250424
Funds:  National Natural Science Foundation of China (62473200, 62003169), Jiangsu Province Youth Science and Technology Talent Support Project (JSTJ-2024-195), “Qinglan Project” of Jiangsu Province
  • Received Date: 2025-05-15
  • Accepted Date: 2025-12-12
  • Rev Recd Date: 2025-12-12
  • Available Online: 2025-12-19
  •   Objective  This paper proposes an innovative multimodal emotion recognition method that integrates Electroencephalography (EEG) and speech signals, aiming to address the issues of noise and individual differences commonly seen in traditional single-modality emotion recognition systems. Despite the progress made in emotion recognition research, most methods still face challenges with low cross-subject emotion recognition accuracy and significant noise interference. Particularly with EEG signals, due to physiological differences across subjects, there is considerable variability in emotion classification performance. Speech signals are also susceptible to noise and missing data. The motivation for this research is to propose a dual-modality recognition method that combines EEG and speech signals, overcoming the limitations of existing studies and enhancing the stability and generalization ability of emotion recognition systems.  Methods  The proposed method utilizes two independent feature extractors for EEG and speech signals. (1) For EEG, a dual feature extractor combining time-frame-channel joint attention and state-space modeling is designed to capture key temporal and spectral features. (2) For speech, a Bidirectional Long Short-Term Memory (Bi-LSTM) network with a frame-level random masking mechanism is used to enhance the model's ability to handle missing or noisy speech data. (3) The modality refinement fusion module incorporates gradient reversal and orthogonal projection to optimize feature alignment and discrimination. (4) An adversarial mutual reconstruction mechanism is applied to ensure consistent emotion feature reconstruction across subjects in a shared latent space.  Results and Discussions  The method is tested on several benchmark datasets including MAHNOB-HCI, EAV, and SEED. In cross-subject validation on the MAHNOB-HCI dataset, the model achieves 81.09% accuracy for Valence and 80.11% for Arousal, outperforming several existing models. The results from five-fold cross-validation are even more impressive, with 98.14% accuracy for Valence and 98.37% for Arousal, demonstrating strong generalization and stability. On the EAV dataset, the proposed model achieves 73.29% accuracy, which is significantly higher than the 60.85% achieved by traditional Convolutional Neural Network (CNN)-based models. In single-modality testing on the SEED dataset, the model achieves 89.33% accuracy, demonstrating the effectiveness of the dual attention mechanism and adversarial mutual reconstruction in improving generalization across subjects.  Conclusions  The proposed dual-stream attention and adversarial mutual reconstruction approach offers a promising solution to the challenges of cross-subject emotion recognition and multimodal fusion in affective computing. It provides a robust method for handling individual differences and noise in multimodal emotion recognition tasks, with potential applications in real-world human-computer interaction systems.
  • loading
  • [1]
    LI Wei, HUAN Wei, HOU Bowen, et al. Can emotion be transferred?—A review on transfer learning for EEG-based emotion recognition[J]. IEEE Transactions on Cognitive and Developmental Systems, 2022, 14(3): 833–846. doi: 10.1109/TCDS.2021.3098842.
    [2]
    LI Wei, FANG Cheng, ZHU Zhihao, et al. Fractal spiking neural network scheme for EEG-based emotion recognition[J]. IEEE Journal of Translational Engineering in Health and Medicine, 2024, 12: 106–118. doi: 10.1109/JTEHM.2023.3320132.
    [3]
    HAMADA M, ZAIDAN B B, and ZAIDAN A A. A systematic review for human EEG brain signals based emotion classification, feature extraction, brain condition, group comparison[J]. Journal of Medical Systems, 2018, 42(9): 162. doi: 10.1007/s10916-018-1020-8.
    [4]
    姚鸿勋, 邓伟洪, 刘洪海, 等. 情感计算与理解研究发展概述[J]. 中国图象图形学报, 2022, 27(6): 2008–2035. doi: 10.11834/jig.220085.

    YAO Hongxun, DENG Weihong, LIU Honghai, et al. An overview of research development of affective computing and understanding[J]. Journal of Image and Graphics, 2022, 27(6): 2008–2035. doi: 10.11834/jig.220085.
    [5]
    MA Jiaxin, TANG Hao, ZHENG Weilong, et al. Emotion recognition using multimodal residual LSTM network[C]. Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 2019: 176–183. doi: 10.1145/3343031.3350871.
    [6]
    LI Mu and LU Baoliang. Emotion classification based on gamma-band EEG[C]. 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Minneapolis, USA, 2009: 1223–1226. doi: 10.1109/IEMBS.2009.5334139.
    [7]
    FRANTZIDIS C A, BRATSAS C, PAPADELIS C L, et al. Toward emotion aware computing: An integrated approach using multichannel neurophysiological recordings and affective visual stimuli[J]. IEEE transactions on Information Technology in Biomedicine, 2010, 14(3): 589–597. doi: 10.1109/TITB.2010.2041553.
    [8]
    LAWHERN V J, SOLON A J, WAYTOWICH N R, et al. EEGNet: A compact convolutional neural network for EEG-based brain–computer interfaces[J]. Journal of Neural Engineering, 2018, 15(5): 056013. doi: 10.1088/1741-2552/aace8c.
    [9]
    YANG Yilong, WU Qingfeng, FU Yazhen, et al. Continuous convolutional neural network with 3D input for EEG-based emotion recognition[C]. 25th International Conference on Neural Information Processing, Siem Reap, Cambodia, 2018: 433–443. doi: 10.1007/978-3-030-04239-4_39.
    [10]
    DU Xiaobing, MA Cuixia, ZHANG Guanhua, et al. An efficient LSTM network for emotion recognition from multichannel EEG signals[J]. IEEE Transactions on Affective Computing, 2022, 13(3): 1528–1540. doi: 10.1109/TAFFC.2020.3013711.
    [11]
    SHEN Jian, LI Kunlin, LIANG Huajian, et al. HEMAsNet: A hemisphere asymmetry network inspired by the brain for depression recognition from electroencephalogram signals[J]. IEEE Journal of Biomedical and Health Informatics, 2024, 28(9): 5247–5259. doi: 10.1109/JBHI.2024.3404664.
    [12]
    孙强, 陈远. 多层次时空特征自适应集成与特有-共享特征融合的双模态情感识别[J]. 电子与信息学报, 2024, 46(2): 574–587. doi: 10.11999/JEIT231110.

    SUN Qiang and CHEN Yuan. Bimodal emotion recognition with adaptive integration of multi-level spatial-temporal features and specific-shared feature fusion[J]. Journal of Electronics & Information Technology, 2024, 46(2): 574–587. doi: 10.11999/JEIT231110.
    [13]
    LI Chao, BIAN Ning, ZHAO Ziping, et al. Multi-view domain-adaptive representation learning for EEG-based emotion recognition[J]. Information Fusion, 2024, 104: 102156. doi: 10.1016/j.inffus.2023.102156.
    [14]
    CHUANG Zejing and WU C H. Multi-modal emotion recognition from speech and text[J]. International Journal of Computational Linguistics & Chinese Language Processing, 2004, 9(2): 45–62.
    [15]
    ZHENG Wenbo, YAN Lan, and WANG Feiyue. Two birds with one stone: Knowledge-embedded temporal convolutional transformer for depression detection and emotion recognition[J]. IEEE Transactions on Affective Computing, 2023, 14(4): 2595–2613. doi: 10.1109/TAFFC.2023.3282704.
    [16]
    NING Zhaolong, HU Hao, YI Ling, et al. A depression detection auxiliary decision system based on multi-modal feature-level fusion of EEG and speech[J]. IEEE Transactions on Consumer Electronics, 2024, 70(1): 3392–3402. doi: 10.1109/TCE.2024.3370310.
    [17]
    杨杨, 詹德川, 姜远, 等. 可靠多模态学习综述[J]. 软件学报, 2021, 32(4): 1067–1081. doi: 10.13328/j.cnki.jos.006167.

    YANG Yang, ZHAN Dechuan, JIANG Yuan, et al. Reliable multi-modal learning: A survey[J]. Journal of Software, 2021, 32(4): 1067–1081. doi: 10.13328/j.cnki.jos.006167.
    [18]
    ZHENG Weilong and LU Baoliang. Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks[J]. IEEE Transactions on Autonomous Mental Development, 2015, 7(3): 162–175. doi: 10.1109/TAMD.2015.2431497.
    [19]
    HOU Fazheng, GAO Qiang, SONG Yu, et al. Deep feature pyramid network for EEG emotion recognition[J]. Measurement, 2022, 201: 111724. doi: 10.1016/j.measurement.2022.111724.
    [20]
    ZHANG Jianhua, YIN Zhong, CHEN Peng, et al. Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review[J]. Information Fusion, 2020, 59: 103–126. doi: 10.1016/j.inffus.2020.01.011.
    [21]
    GU A and DAO T. Mamba: Linear-time sequence modeling with selective state spaces[C]. First Conference on Language Modeling, Philadelphia, USA, 2024. (查阅网上资料, 未找到本条文献信息, 请确认).
    [22]
    薛珮芸, 戴书涛, 白静, 等. 借助语音和面部图像的双模态情感识别[J]. 电子与信息学报, 2024, 46(12): 4542–4552. doi: 10.11999/JEIT240087.

    XUE Peiyun, DAI Shutao, BAI Jing, et al. Emotion recognition with speech and facial images[J]. Journal of Electronics & Information Technology, 2024, 46(12): 4542–4552. doi: 10.11999/JEIT240087.
    [23]
    HUANG Poyao, XU Hu, LI Juncheng, et al. Masked autoencoders that listen[C]. Proceedings of the 36th International Conference on Neural Information Processing Systems, New Orleans, USA, 2022: 2081.
    [24]
    李幼军, 黄佳进, 王海渊, 等. 基于SAE和LSTM RNN的多模态生理信号融合和情感识别研究[J]. 通信学报, 2017, 38(12): 109–120. doi: 10.11959/j.issn.1000-436x.2017294.

    LI Youjun, HUANG Jiajin, WANG Haiyuan, et al. Study of emotion recognition based on fusion multi-modal bio-signal with SAE and LSTM recurrent neural network[J]. Journal on Communications, 2017, 38(12): 109–120. doi: 10.11959/j.issn.1000-436x.2017294.
    [25]
    WANG Yiming, ZHANG Bin, and TANG Yujiao. DMMR: Cross-subject domain generalization for EEG-based emotion recognition via denoising mixed mutual reconstruction[C] Proceedings of the 38th AAAI Conference on Artificial Intelligence, Vancouver, Canada, 2024: 628–636. doi: 10.1609/aaai.v38i1.27819.
    [26]
    SOLEYMANI M, LICHTENAUER J, PUN T, et al. A multimodal database for affect recognition and implicit tagging[J]. IEEE Transactions on Affective Computing, 2012, 3(1): 42–55. doi: 10.1109/T-AFFC.2011.25.
    [27]
    LEE M H, SHOMANOV A, BEGIM B, et al. EAV: EEG-audio-video dataset for emotion recognition in conversational contexts[J]. Scientific Data, 2024, 11(1): 1026. doi: 10.1038/s41597-024-03838-4.
    [28]
    ZHENG Weilong and LU Baoliang. Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks[J]. IEEE Transactions on Autonomous Mental Development, 2015, 7(3): 162–175. doi: 10.1109/TAMD.2015.2431497. (查阅网上资料, 本条文献和第18条文献重复,请核对).
    [29]
    SUYKENS J A K and VANDEWALLE J. Least squares support vector machine classifiers[J]. Neural Processing Letters, 1999, 9(3): 293–300. doi: 10.1023/A:1018628609742.
    [30]
    COVER T and HART P. Nearest neighbor pattern classification[J]. IEEE Transactions on Information Theory, 1967, 13(1): 21–27. doi: 10.1109/TIT.1967.1053964.
    [31]
    HUANG Yongrui, YANG Jianhao, LIU Siyu, et al. Combining facial expressions and electroencephalography to enhance emotion recognition[J]. Future Internet, 2019, 11(5): 105. doi: 10.3390/fi11050105.
    [32]
    LI Ruixin, LIANG Yan, LIU Xiaojian, et al. MindLink-eumpy: An open-source python toolbox for multimodal emotion recognition[J]. Frontiers in Human Neuroscience, 2021, 15: 621493. doi: 10.3389/fnhum.2021.621493.
    [33]
    ZHANG Yuhao, HOSSAIN Z, and RAHMAN S. DeepVANet: A deep end-to-end network for multi-modal emotion recognition[C]. 18th International Conference on Human-Computer Interaction, Bari, Italy, 2021: 227–237. doi: 10.1007/978-3-030-85613-7_16.
    [34]
    SALAMA E S, EL-KHORIBI R A, SHOMAN M E, et al. A 3D-convolutional neural network framework with ensemble learning techniques for multi-modal emotion recognition[J]. Egyptian Informatics Journal, 2021, 22(2): 167–176. doi: 10.1016/j.eij.2020.07.005.
    [35]
    CHEN Jingxia, LIU Yang, XUE Wen, et al. Multimodal EEG emotion recognition based on the attention recurrent graph convolutional network[J]. Information, 2022, 13(11): 550. doi: 10.3390/info13110550.
    [36]
    WANG Shuai, QU Jingzi, ZHANG Yong, et al. Multimodal emotion recognition from EEG signals and facial expressions[J]. IEEE Access, 2023, 11: 33061–33068. doi: 10.1109/ACCESS.2023.3263670.
    [37]
    YIN Kang, SHIN H B, LI Dan, et al. EEG-based multimodal representation learning for emotion recognition[C]. 2025 13th International Conference on Brain-Computer Interface, Gangwon, Korea, Republic of, 2025: 1–4. doi: 10.1109/BCI65088.2025.10931743.
    [38]
    SONG Tengfei, ZHENG Wenming, SONG Peng, et al. EEG emotion recognition using dynamical graph convolutional neural networks[J]. IEEE Transactions on Affective Computing, 2020, 11(3): 532–541. doi: 10.1109/TAFFC.2018.2817622.
    [39]
    LI Yang, CHEN Ji, LI Fu, et al. GMSS: Graph-based multi-task self-supervised learning for EEG emotion recognition[J]. IEEE Transactions on Affective Computing, 2023, 14(3): 2512–2525. doi: 10.1109/TAFFC.2022.3170428.
    [40]
    MA Boqun, LI He, ZHENG Weilong, et al. Reducing the subject variability of EEG signals with adversarial domain generalization[C]. 26th International Conference on Neural Information Processing, Sydney, Australia, 2019: 30–42. doi: 10.1007/978-3-030-36708-4_3.
    [41]
    ZHAO Liming, YAN Xu, and LU Baoliang. Plug-and-play domain adaptation for cross-subject EEG-based emotion recognition[C] Proceedings of the 35th AAAI Conference on Artificial Intelligence, Beijing, China, 2021: 863–870. doi: 10.1609/aaai.v35i1.16169.(查阅网上资料,未找到出版地信息,请确认).
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(7)  / Tables(5)

    Article Metrics

    Article views (65) PDF downloads(17) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return