Advanced Search
Turn off MathJax
Article Contents
PENG Juhong, ZHANG Zhi, LIU Peng, GE Wenhui, LIU Chen, LIAO Lingxin, ZHANG Kai. A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260063
Citation: PENG Juhong, ZHANG Zhi, LIU Peng, GE Wenhui, LIU Chen, LIAO Lingxin, ZHANG Kai. A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260063

A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception

doi: 10.11999/JEIT260063 cstr: 32379.14.JEIT260063
Funds:  The National Natural Science Foundation of China (62377009)
  • Received Date: 2026-01-20
  • Accepted Date: 2026-04-23
  • Rev Recd Date: 2026-04-20
  • Available Online: 2026-05-13
  •   Objective  Multimodal sentiment analysis is often affected by visual noise from complex environments, image-text sentiment inconsistency, and imbalanced modality contributions. When all modalities are treated without distinction, visual noise can degrade model performance. A robust mechanism is therefore needed to evaluate visual confidence and filter redundant visual information.  Methods  A Multimodal Sentiment Analysis Model with Multi-Source Knowledge-guided Visual confidence Perception (MKVP) is proposed (Fig. 1). A multi-source knowledge guidance matrix is constructed using syntactic-dependency, sentiment-intensity, and aspect-focused operators (Fig. 2). Guided by this matrix, the Visual Confidence Perception (VCP) module measures semantic affinity and dynamically suppresses irrelevant visual noise (Fig. 3). A dual-stream parallel interaction module is then used to support deep cross-modal alignment, and a global gated fusion mechanism further adjusts the fusion weights of different modalities.  Results and Discussions  Extensive experiments are conducted on the MVSA-Single, MVSA-Multiple, and HFM datasets. The proposed MKVP model achieves accuracy and F1 scores of 77.56% and 76.70%, 72.72% and 70.66%, and 87.26% and 86.78%, respectively. Compared with the baseline models, the accuracy and F1 score are improved by 2.45% and 3.68%, 2.19% and 2.21%, and 1.83% and 1.91%, respectively (Table 3). Ablation studies show that each component contributes to performance, especially the VCP module, which filters visual noise and improves feature quality (Table 5). Feature-space visualization further confirms that the VCP module refines semantic representations by promoting clearer clustering of samples with the same sentiment polarity (Fig. 4). Case studies on mismatched image-text samples also verify the ability of the model to resolve cross-modal semantic conflicts (Table 6). Model-complexity analysis shows that MKVP maintains high computational efficiency and low inference latency (Table 8).  Conclusions  The proposed MKVP framework reduces the effects of visual noise and image-text sentiment inconsistency in multimodal sentiment analysis. By using multi-source knowledge to guide visual confidence perception and combining dual-stream interaction with dynamic gated fusion, the model learns robust sentiment representations from noisy multimodal data. This method provides an efficient and reliable solution for complex social media scenarios.
  • loading
  • [1]
    YUAN Yuan, LI Zhaojian, and ZHAO Bin. A survey of multimodal learning: Methods, applications, and future[J]. ACM Computing Surveys, 2025, 57(7): 167. doi: 10.1145/3713070.
    [2]
    LU Ming, DONG Zhiqiang, GUO Ziming, et al. A multi-modal sarcasm detection model based on cue learning[J]. Scientific Reports, 2025, 15(1): 10261. doi: 10.1038/s41598-025-94266-w.
    [3]
    ZHAO Kai, ZHENG Mingsheng, LI Qingguan, et al. Multimodal sentiment analysis—a comprehensive survey from a fusion methods perspective[J]. IEEE Access, 2025, 13: 64556–64583. doi: 10.1109/ACCESS.2025.3554665.
    [4]
    LIU Xinjing, LI Ruifan, YE Shuqin, et al. Multimodal aspect-based sentiment analysis under conditional relation[C]. The 31st International Conference on Computational Linguistics, Abu Dhabi, UAE, 2025: 313–323.
    [5]
    YU Bengong, LI Chenyue, and SHI Zhongyu. Multi-grained feature gating fusion network for multimodal sentiment analysis[J]. Knowledge and Information Systems, 2025, 67(8): 6879–6905. doi: 10.1007/s10115-025-02446-x.
    [6]
    HUANG Huiting, GONG Tieliang, HE Kai, et al. Robust multimodal sentiment analysis via double information bottleneck[J]. Information Fusion, 2026, 129: 103964. doi: 10.1016/j.inffus.2025.103964.
    [7]
    胡泽, 陈志南, 杨宏宇. 多源特征融合增强的虚假新闻检测方法[J]. 电子与信息学报, 2025, 47(8): 2919–2934. doi: 10.11999/JEIT250041.

    HU Ze, CHEN Zhinan, and YANG Hongyu. A fake news detection approach enhanced by multi-source feature fusion[J]. Journal of Electronics & Information Technology, 2025, 47(8): 2919–2934. doi: 10.11999/JEIT250041.
    [8]
    ZI Lingling, PAN Xiangkai, and CONG Xin. MFSC: A multimodal aspect-level sentiment classification framework with multi-image gate and fusion networks[J]. Electronics, 2024, 13(12): 2349. doi: 10.3390/electronics13122349.
    [9]
    YANG Xiaocui, FENG Shi, ZHANG Yifei, et al. Multimodal sentiment detection based on multi-channel graph neural networks[C]. The 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021: 328–339. doi: 10.18653/v1/2021.acl-long.28.
    [10]
    WANG Hongbin, REN Chun, and YU Zhengtao. Multimodal sentiment analysis based on cross-instance graph neural networks[J]. Applied Intelligence, 2024, 54(4): 3403–3416. doi: 10.1007/s10489-024-05309-0.
    [11]
    ZHONG Qihuang, DING Liang, LIU Juhua, et al. Knowledge graph augmented network towards multiview representation learning for aspect-based sentiment analysis[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(10): 10098–10111. doi: 10.1109/TKDE.2023.3250499.
    [12]
    KIM Y. Convolutional neural networks for sentence classification[C]. The 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014: 1746–1751. doi: 10.3115/v1/D14-1181.
    [13]
    ZHOU Peng, SHI Wei, TIAN Jun, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]. The 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany, 2016: 207–212. doi: 10.18653/v1/P16-2034.
    [14]
    LAI Siwei, XU Liheng, LIU Kang, et al. Recurrent convolutional neural networks for text classification[C]. The 29th AAAI Conference on Artificial Intelligence, Austin, USA, 2015: 2267–2273. doi: 10.1609/aaai.v29i1.9513.
    [15]
    HUANG Lianzhe, MA Dehong, LI Sujian, et al. Text level graph neural network for text classification[C]. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019: 3444–3450. doi: 10.18653/v1/D19-1345.
    [16]
    XU Nan and MAO Wenji. MultiSentiNet: A deep semantic network for multimodal sentiment analysis[C]. The 2017 ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017: 2399–2402. doi: 10.1145/3132847.3133142.
    [17]
    SCHIFANELLA R, DE JUAN P, TETREAULT J, et al. Detecting sarcasm in multimodal social platforms[C]. The 24th ACM International Conference on Multimedia, Amsterdam, Netherlands, 2016: 1136–1145. doi: 10.1145/2964284.2964321.
    [18]
    XU Nan, ZENG Zhixiong, and MAO Wenji. Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association[C]. The 58th Annual Meeting of the Association for Computational Linguistics, 2020: 3777–3786. doi: 10.18653/v1/2020.acl-main.349.
    [19]
    LI Zhen, XU Bing, ZHU Conghui, et al. CLMLF: A contrastive learning and multi-layer fusion method for multimodal sentiment detection[C]. Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, USA, 2022: 2282–2294. doi: 10.18653/v1/2022.findings-naacl.175.
    [20]
    WEI Yiwei, YUAN Shaozu, YANG Ruosong, et al. Tackling modality heterogeneity with multi-view calibration network for multimodal sentiment detection[C]. The 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada, 2023: 5240–5252. doi: 10.18653/v1/2023.acl-long.287.
    [21]
    CHEN Yifan, LI Kuntao, MAI Weixing, et al. D2R: Dual-branch dynamic routing network for multimodal sentiment detection[C]. The 2024 Conference on Empirical Methods in Natural Language Processing, Miami, USA, 2024: 3536–3547. doi: 10.18653/v1/2024.emnlp-main.207.
    [22]
    余本功, 石中玉. 深层注意力和两阶段融合的图文情感对比学习方法[J]. 计算机工程与应用, 2025, 61(3): 223–233. doi: 10.3778/j.issn.1002-8331.2309-0470.

    YU Bengong and SHI Zhongyu. Deep attention and two-stage fusion of image-text sentiment contrastive learning method[J]. Computer Engineering and Applications, 2025, 61(3): 223–233. doi: 10.3778/j.issn.1002-8331.2309-0470.
    [23]
    卜韵阳, 卜凡亮, 张志江. 多通道交互下全局语义信息增强的多模态情感分析[J]. 计算机工程与应用, 2025, 61(19): 137–146. doi: 10.3778/j.issn.1002-8331.2406-0376.

    BU Yunyang, BU Fanliang, and ZHANG Zhijiang. Multimodal sentiment analysis of global semantic information enhancement under multi-channel interaction[J]. Computer Engineering and Applications, 2025, 61(19): 137–146. doi: 10.3778/j.issn.1002-8331.2406-0376.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)  / Tables(8)

    Article Metrics

    Article views (52) PDF downloads(6) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return