Advanced Search
Turn off MathJax
Article Contents
HUANG Chen, LIU Huijie, ZHANG Yan, YANG Chao, SONG Jianhua. Multimodal Hypergraph Learning Guidance with Global Noise Enhancement for Sentiment Analysis under Missing Modality Information[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250649
Citation: HUANG Chen, LIU Huijie, ZHANG Yan, YANG Chao, SONG Jianhua. Multimodal Hypergraph Learning Guidance with Global Noise Enhancement for Sentiment Analysis under Missing Modality Information[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250649

Multimodal Hypergraph Learning Guidance with Global Noise Enhancement for Sentiment Analysis under Missing Modality Information

doi: 10.11999/JEIT250649 cstr: 32379.14.JEIT250649
Funds:  Wuhan Knowledge Innovation Special Project(202311901251001), Hubei Provincial Science and Technology Plan Major Science and Technology Special Project (2024BAA008), The Key Projects of Science and Technology in Shenzhen (2020N061)
  • Received Date: 2025-07-09
  • Rev Recd Date: 2025-10-14
  • Available Online: 2025-10-23
  •   Objective  Multimodal Sentiment Analysis (MSA) has shown considerable promise in interdisciplinary domains such as Natural Language Processing (NLP) and Affective Computing, particularly by integrating information from ElectroEncephaloGraphy (EEG) signals, visual images, and text to classify sentiment polarity and provide a comprehensive understanding of human emotional states. However, in complex real-world scenarios, challenges including missing modalities, limited high-level semantic correlation learning across modalities, and the lack of mechanisms to guide cross-modal information transfer substantially restrict the generalization ability and accuracy of sentiment recognition models. To address these limitations, this study proposes a Multimodal Hypergraph Learning Guidance method with Global Noise Enhancement (MHLGNE), designed to improve the robustness and performance of MSA under conditions of missing modality information in complex environments.  Methods  The overall architecture of the MHLGNE model is illustrated in Figure 2 and consists of the Adaptive Global Noise Sampling Module, the Multimodal Hypergraph Learning Guiding Module, and the Sentiment Prediction Target Module. A pretrained language model is first applied to encode the multimodal input data. To simulate missing modality conditions, the input data are constructed with incomplete modal information, where a modality $ m\in \{e,v,t\} $ is randomly absent. The adaptive global noise sampling strategy is then employed to supplement missing modalities from a global perspective, thereby improving adaptability and enhancing both robustness and generalization in complex environments. This design allows the model to handle noisy data and missing modalities more effectively. The Multimodal Hypergraph Learning Guiding Module is further applied to capture high-level semantic correlations across different modalities, overcoming the limitations of conventional methods that rely only on feature alignment and fusion. By guiding cross-modal information transfer, this module enables the model to focus on essential inter-modal semantic dependencies, thereby improving sentiment prediction accuracy. Finally, the performance of MHLGNE is compared with that of State-Of-The-Art (SOTA) MSA models under two conditions: complete modality data and randomly missing modality information.  Results and Discussions  Three publicly available MSA datasets (SEED-IV, SEED-V, and DREAMER) are employed, with features extracted from EEG signals, visual images, and text. To ensure robustness, standard cross-validation is applied, and the training process is conducted with iterative adjustments to the noise sampling strategy, modality fusion method, and hypergraph learning structure to optimize sentiment prediction. Under the complete modality condition, MHLGNE is observed to outperform the second-best M2S model across most evaluation metrics, with accuracy improvements of 3.26%, 2.10%, and 0.58% on SEED-IV, SEED-V, and DREAMER, respectively. Additional metrics also indicate advantages over other SOTA methods. Under the random missing modality condition, MHLGNE maintains superiority over existing MSA approaches, with improvements of 1.03% in accuracy, 0.24% in precision, and 0.08 in Kappa score. The adaptive noise sampling module is further shown to effectively compensate for missing modalities. Unlike conventional models that suffer performance degradation under such conditions, MHLGNE maintains robustness by generating complementary information. In addition, the multimodal hypergraph structure enables the capture of high-level semantic dependencies across modalities, thereby strengthening cross-modal information transfer and offering clear advantages when modalities are absent. Ablation experiments confirm the independent contributions of each module. The removal of either the adaptive noise sampling or the multimodal hypergraph learning guiding module results in notable performance declines, particularly under high-noise or severely missing modality conditions. The exclusion of the cross-modal information transfer mechanism causes a substantial decline in accuracy and robustness, highlighting its essential role in MSA.  Conclusions  The MHLGNE model, equipped with the Adaptive Global Noise Sampling Module and the Multimodal Hypergraph Learning Guiding Module, markedly improves the performance of MSA under conditions of missing modalities and in tasks requiring effective cross-modal information transfer. Experiments on SEED-IV, SEED-V, and DREAMER confirm that MHLGNE exceeds SOTA MSA models across multiple evaluation metrics, including accuracy, precision, Kappa score, and F1 score, thereby demonstrating its robustness and effectiveness. Future work may focus on refining noise sampling strategies and developing more sophisticated hypergraph structures to further strengthen performance under extreme modality-missing scenarios. In addition, this framework has the potential to be extended to broader sentiment analysis tasks across diverse application domains.
  • loading
  • [1]
    刘佳, 宋泓, 陈大鹏, 等. 非语言信息增强和对比学习的多模态情感分析模型[J]. 电子与信息学报, 2024, 46(8): 3372–3381. doi: 10.11999/JEIT231274.

    LIU Jia, SONG Hong, CHEN Dapeng, et al. A multimodal sentiment analysis model enhanced with non-verbal information and contrastive learning[J]. Journal of Electronics & Information Technology, 2024, 46(8): 3372–3381. doi: 10.11999/JEIT231274.
    [2]
    WANG Pan, ZHOU Qiang, WU Yawen, et al. DLF: Disentangled-language-focused multimodal sentiment analysis[C]. Proceedings of the 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 21180–21188. doi: 10.1609/aaai.v39i20.35416.
    [3]
    XU Qinfu, WEI Yiwei, WU Chunlei, et al. Towards multimodal sentiment analysis via hierarchical correlation modeling with semantic distribution constraints[C]. Proceedings of the 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 21788–21796. doi: 10.1609/aaai.v39i20.35484.
    [4]
    XU Xi, LI Jianqiang, ZHU Zhichao, et al. A comprehensive review on synergy of multi-modal data and AI technologies in medical diagnosis[J]. Bioengineering, 2024, 11(3): 219. doi: 10.3390/bioengineering11030219.
    [5]
    LIU Huan, LOU Tianyu, ZHANG Yuzhe, et al. EEG-based multimodal emotion recognition: A machine learning perspective[J]. IEEE Transactions on Instrumentation and Measurement, 2024, 73: 4003729. doi: 10.1109/TIM.2024.3369130.
    [6]
    LIU Zhicheng, BRAYTEE A, ANAISSI A, et al. Ensemble pretrained models for multimodal sentiment analysis using textual and video data fusion[C]. Proceedings of the ACM Web Conference 2024, Singapore, Singapore, 2024: 1841–1848. doi: 10.1145/3589335.3651971.
    [7]
    SUN Hao, NIU Ziwei, WANG Hongyi, et al. Multimodal sentiment analysis with mutual information-based disentangled representation learning[J]. IEEE Transactions on Affective Computing, 2025, 16(3): 1606–1617. doi: 10.1109/TAFFC.2025.3529732.
    [8]
    ZHAO Sicheng, YANG Zhenhua, SHI Henglin, et al. SDRS: Sentiment-aware disentangled representation shifting for multimodal sentiment analysis[J]. IEEE Transactions on Affective Computing, 2025, 16(3): 1802–1813. doi: 10.1109/TAFFC.2025.3539225.
    [9]
    LUO Yuanyi, LIU Wei, SUN Qiang, et al. TriagedMSA: Triaging sentimental disagreement in multimodal sentiment analysis[J]. IEEE Transactions on Affective Computing, 2025, 16(3): 1557–1569. doi: 10.1109/TAFFC.2024.3524789.
    [10]
    WANG Yuhao, LIU Yang, ZHENG Aihua, et al. Decoupled feature-based mixture of experts for multi-modal object re-identification[C]. Proceedings of the 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 8141–8149. doi: 10.1609/aaai.v39i8.32878.
    [11]
    WU Sheng, HE Dongxiao, WANG Xiaobao, et al. Enriching multimodal sentiment analysis through textual emotional descriptions of visual-audio content[C]. Proceedings of the 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 1601–1609. doi: 10.1609/aaai.v39i2.32152.
    [12]
    SUN Xin, REN Xiangyu, and XIE Xiaohao. A novel multimodal sentiment analysis model based on gated fusion and multi-task learning[C]. Proceedings of the ICASSP 2024–2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024: 8336–8340. doi: 10.1109/ICASSP48485.2024.10446040.
    [13]
    LI Meng, ZHU Zhenfang, LI Kefeng, et al. Diversity and balance: Multimodal sentiment analysis using multimodal-prefixed and cross-modal attention[J]. IEEE Transactions on Affective Computing, 2025, 16(1): 250–263. doi: 10.1109/TAFFC.2024.3430045.
    [14]
    LIU Zhicheng, BRAYTEE A, ANAISSI A, et al. Ensemble pretrained models for multimodal sentiment analysis using textual and video data fusion[C]. Proceedings of the ACM Web Conference 2024, Singapore, Singapore, 2024: 1841–1848. doi: 10.1145/3589335.3651971. (查阅网上资料,本条文献与第6条文献重复,请确认).
    [15]
    TANG Jiajia, LI Kang, JIN Xuanyu, et al. CTFN: Hierarchical learning for multimodal sentiment analysis using coupled-translation fusion network[C]. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (volume 1: Long Papers), 2021: 5301–5311. doi: 10.18653/v1/2021.acl-long.412. (查阅网上资料,未找到本条文献出版地信息,请确认).
    [16]
    ZENG Jiandian, ZHOU Jiantao, and LIU Tianyi. Mitigating inconsistencies in multimodal sentiment analysis under uncertain missing modalities[C]. Proceedings of 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 2022: 2924–2934. doi: 10.18653/v1/2022.emnlp-main.189.
    [17]
    LIU Yankai, CAI Jinyu, LU Baoliang, et al. Multi-to-single: Reducing multimodal dependency in emotion recognition through contrastive learning[C]. Proceedings of the 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 1438–1446. doi: 10.1609/aaai.v39i2.32134.
    [18]
    TAO Chuanqi, LI Jiaming, ZANG Tianzi, et al. A multi-focus-driven multi-branch network for robust multimodal sentiment analysis[C]. Proceedings of the 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 1547–1555. doi: 10.1609/aaai.v39i2.32146.
    [19]
    BALTRUŠAITIS T, ROBINSON P, and MORENCY L P. OpenFace: An open source facial behavior analysis toolkit[C]. Proceedings of 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, USA, IEEE, 2016: 1–10. doi: 10.1109/WACV.2016.7477553.
    [20]
    LIU Yinhan, OTT M, GOYAL N, et al. RoBERTa: A robustly optimized BERT pretraining approach[EB/OL]. https://arxiv.org/abs/1907.11692, 2019.
    [21]
    FANG Feiteng, BAI Yuelin, NI Shiwen, et al. Enhancing noise robustness of retrieval-augmented language models with adaptive adversarial training[C]. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Bangkok, Thailand, 2024: 10028–10039. doi: 10.18653/v1/2024.acl-long.540.
    [22]
    CHEN Zhuo, GUO Lingbing, FANG Yin, et al. Rethinking uncertainly missing and ambiguous visual modality in multi-modal entity alignment[C]. Proceedings of the 22nd International Semantic Web Conference on the Semantic Web, Athens, Greece, 2023: 121–139. doi: 10.1007/978-3-031-47240-4_7.
    [23]
    GAO Min, ZHENG Haifeng, FENG Xinxin, et al. Multimodal fusion using multi-view domains for data heterogeneity in federated learning[C]. Proceedings of the 39th AAAI Conference on Artificial Intelligence, Philadelphia, USA, 2025: 16736–16744. doi: 10.1609/aaai.v39i16.33839.
    [24]
    ZHOU Yan, FANG Qingkai, and FENG Yang. CMOT: Cross-modal Mixup via optimal transport for speech translation[C]. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada, 2023: 7873–7887. doi: 10.18653/v1/2023.acl-long.436.
    [25]
    ZHENG Weilong, LIU Wei, LU Yifei, et al. EmotionMeter: A multimodal framework for recognizing human emotions[J]. IEEE Transactions on Cybernetics, 2019, 49(3): 1110–1122. doi: 10.1109/TCYB.2018.2797176.
    [26]
    LIU Wei, QIU Jielin, ZHENG Weilong, et al. Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition[J]. IEEE Transactions on Cognitive and Developmental Systems, 2022, 14(2): 715–729. doi: 10.1109/TCDS.2021.3071170.
    [27]
    KATSIGIANNIS S and RAMZAN N. DREAMER: A database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices[J]. IEEE Journal of Biomedical and Health Informatics, 2018, 22(1): 98–107. doi: 10.1109/JBHI.2017.2688239.
    [28]
    JIANG Huangfei, GUAN Xiya, ZHAO Weiye, et al. Generating multimodal features for emotion classification from eye movement signals[J]. Australian Journal of Intelligent Information Processing Systems, 2019, 15(3): 59–66.
    [29]
    YAN Xu, ZHAO Liming, and LU Baoliang. Simplifying multimodal emotion recognition with single eye movement modality[C]. Proceedings of the 29th ACM International Conference on Multimedia, 2021: 1057–1063. doi: 10.1145/3474085.3475701. (查阅网上资料,未找到本条文献出版地信息,请确认).
    [30]
    XIA Yan, HUANG Hai, ZHU Jieming, et al. Achieving cross modal generalization with multimodal unified representation[C]. Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, USA, 2021: 2774.
    [31]
    JIANG Weibang, LI Ziyi, ZHENG Weilong, et al. Functional emotion transformer for EEG-assisted cross-modal emotion recognition[C]. Proceedings of the ICASSP 2024–2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024: 1841–1845. doi: 10.1109/ICASSP48485.2024.10446937.
    [32]
    YUAN Ziqi, LI Wei, XU Hua, et al. Transformer-based feature reconstruction network for robust multimodal sentiment analysis[C]. Proceedings of the 29th ACM International Conference on Multimedia, 2021: 4400–4407. doi: 10.1145/3474085.3475585. (查阅网上资料,未找到本条文献出版地信息,请确认).
    [33]
    JIANG Weibang, LIU Xuanhao, ZHENG Weilong, et al. Multimodal adaptive emotion transformer with flexible modality inputs on a novel dataset with continuous labels[C]. Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, Canada, 2023: 5975–5984. doi: 10.1145/3581783.3613797.
    [34]
    LI Jiabao, LIU Ruyi, MIAO Qiguang, et al. CAETFN: Context adaptively enhanced text-guided fusion network for multimodal sentiment analysis[J]. IEEE Transactions on Affective Computing, 2025. doi: 10.1109/TAFFC.2025.3590246. (查阅网上资料,未找到本条文献卷期页码信息,请确认).
    [35]
    HUANG Jiayang, VONG C M, LI Chen, et al. HSA-former: Hierarchical spatial aggregation transformer for EEG-based emotion recognition[J]. IEEE Transactions on Computational Social Systems, 2025. doi: 10.1109/TCSS.2025.3567298. (查阅网上资料,未找到本条文献卷期页码信息,请确认).
    [36]
    DENG Jiawen and REN Fuji. Multi-label emotion detection via emotion-specified feature extraction and emotion correlation learning[J]. IEEE Transactions on Affective Computing, 2023, 14(1): 475–486. doi: 10.1109/TAFFC.2020.3034215.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(3)  / Tables(5)

    Article Metrics

    Article views (24) PDF downloads(5) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return