A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception

PENG Juhong; ZHANG Zhi; LIU Peng; GE Wenhui; LIU Chen; LIAO Lingxin; ZHANG Kai

doi:10.11999/JEIT260063

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 >

PENG Juhong, ZHANG Zhi, LIU Peng, GE Wenhui, LIU Chen, LIAO Lingxin, ZHANG Kai. A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260063

Citation:

PENG Juhong, ZHANG Zhi, LIU Peng, GE Wenhui, LIU Chen, LIAO Lingxin, ZHANG Kai. A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260063

Citation:

PENG Juhong, ZHANG Zhi, LIU Peng, GE Wenhui, LIU Chen, LIAO Lingxin, ZHANG Kai. A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260063

PDF( 3174 KB)

A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception

doi: 10.11999/JEIT260063 cstr: 32379.14.JEIT260063

PENG Juhong^{1, 2},
ZHANG Zhi^{1, 2},
LIU Peng^{1, 2},
GE Wenhui^{1, 2},
LIU Chen^{1, 2},
LIAO Lingxin^{1, 2},
ZHANG Kai^{3
,
,}

1.
School of Artificial Intelligence, HuBei University, WuHan 430062, China
2.
Key Laboratory of Intelligent Perception Systems and Security, Ministry of Education, WuHan 430062, China
3.
Wuchang Shipbuilding Industry Group Co., Ltd, WuHan 430415, China

Funds: The National Natural Science Foundation of China (62377009)

Received Date: 2026-01-20
Accepted Date: 2026-04-23
Rev Recd Date: 2026-04-20

Available Online: 2026-05-13

Abstract

Abstract

Objective Multimodal sentiment analysis is often affected by visual noise from complex environments, image-text sentiment inconsistency, and imbalanced modality contributions. When all modalities are treated without distinction, visual noise can degrade model performance. A robust mechanism is therefore needed to evaluate visual confidence and filter redundant visual information. Methods A Multimodal Sentiment Analysis Model with Multi-source Knowledge-guided Visual confidence Perception (MKVP) is proposed (Fig. 1). A multi-source knowledge guidance matrix is constructed using syntactic-dependency, sentiment-intensity, and aspect-focused operators (Fig. 2). Guided by this matrix, the Visual Confidence Perception (VCP) module measures semantic affinity and dynamically suppresses irrelevant visual noise (Fig. 3). A dual-stream parallel interaction module is then used to support deep cross-modal alignment, and a global gated fusion mechanism further adjusts the fusion weights of different modalities. Results and Discussions Extensive experiments are conducted on the MVSA-Single, MVSA-Multiple, and HFM datasets. The proposed MKVP model achieves accuracy and F1 scores of 77.56% and 76.70%, 72.72% and 70.66%, and 87.26% and 86.78%, respectively. Compared with the baseline models, the accuracy and F1 score are improved by 2.45% and 3.68%, 2.19% and 2.21%, and 1.83% and 1.91%, respectively (Table 3). Ablation studies show that each component contributes to performance, especially the VCP module, which filters visual noise and improves feature quality (Table 5). Feature-space visualization further confirms that the VCP module refines semantic representations by promoting clearer clustering of samples with the same sentiment polarity (Fig. 4). Case studies on mismatched image-text samples also verify the ability of the model to resolve cross-modal semantic conflicts (Table 6). Model-complexity analysis shows that MKVP maintains high computational efficiency and low inference latency (Table 8). Conclusions The proposed MKVP framework reduces the effects of visual noise and image-text sentiment inconsistency in multimodal sentiment analysis. By using multi-source knowledge to guide visual confidence perception and combining dual-stream interaction with dynamic gated fusion, the model learns robust sentiment representations from noisy multimodal data. This method provides an efficient and reliable solution for complex social media scenarios.
- Multimodal sentiment analysis,
- Multimodal fusion,
- Multi-source knowledge guidance,
- Visual confidence perception

FullText(HTML)

References(23)

References

[1]	YUAN Yuan, LI Zhaojian, and ZHAO Bin. A survey of multimodal learning: Methods, applications, and future[J]. ACM Computing Surveys, 2025, 57(7): 167. doi: 10.1145/3713070.
[2]	LU Ming, DONG Zhiqiang, GUO Ziming, et al. A multi-modal sarcasm detection model based on cue learning[J]. Scientific Reports, 2025, 15(1): 10261. doi: 10.1038/s41598-025-94266-w.
[3]	ZHAO Kai, ZHENG Mingsheng, LI Qingguan, et al. Multimodal sentiment analysis—a comprehensive survey from a fusion methods perspective[J]. IEEE Access, 2025, 13: 64556–64583. doi: 10.1109/ACCESS.2025.3554665.
[4]	LIU Xinjing, LI Ruifan, YE Shuqin, et al. Multimodal aspect-based sentiment analysis under conditional relation[C]. The 31st International Conference on Computational Linguistics, Abu Dhabi, UAE, 2025: 313–323.
[5]	YU Bengong, LI Chenyue, and SHI Zhongyu. Multi-grained feature gating fusion network for multimodal sentiment analysis[J]. Knowledge and Information Systems, 2025, 67(8): 6879–6905. doi: 10.1007/s10115-025-02446-x.
[6]	HUANG Huiting, GONG Tieliang, HE Kai, et al. Robust multimodal sentiment analysis via double information bottleneck[J]. Information Fusion, 2026, 129: 103964. doi: 10.1016/j.inffus.2025.103964.
[7]	胡泽, 陈志南, 杨宏宇. 多源特征融合增强的虚假新闻检测方法[J]. 电子与信息学报, 2025, 47(8): 2919–2934. doi: 10.11999/JEIT250041. HU Ze, CHEN Zhinan, and YANG Hongyu. A fake news detection approach enhanced by multi-source feature fusion[J]. Journal of Electronics & Information Technology, 2025, 47(8): 2919–2934. doi: 10.11999/JEIT250041.
[8]	ZI Lingling, PAN Xiangkai, and CONG Xin. MFSC: A multimodal aspect-level sentiment classification framework with multi-image gate and fusion networks[J]. Electronics, 2024, 13(12): 2349. doi: 10.3390/electronics13122349.
[9]	YANG Xiaocui, FENG Shi, ZHANG Yifei, et al. Multimodal sentiment detection based on multi-channel graph neural networks[C]. The 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021: 328–339. doi: 10.18653/v1/2021.acl-long.28.
[10]	WANG Hongbin, REN Chun, and YU Zhengtao. Multimodal sentiment analysis based on cross-instance graph neural networks[J]. Applied Intelligence, 2024, 54(4): 3403–3416. doi: 10.1007/s10489-024-05309-0.
[11]	ZHONG Qihuang, DING Liang, LIU Juhua, et al. Knowledge graph augmented network towards multiview representation learning for aspect-based sentiment analysis[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(10): 10098–10111. doi: 10.1109/TKDE.2023.3250499.
[12]	KIM Y. Convolutional neural networks for sentence classification[C]. The 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014: 1746–1751. doi: 10.3115/v1/D14-1181.
[13]	ZHOU Peng, SHI Wei, TIAN Jun, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]. The 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany, 2016: 207–212. doi: 10.18653/v1/P16-2034.
[14]	LAI Siwei, XU Liheng, LIU Kang, et al. Recurrent convolutional neural networks for text classification[C]. The 29th AAAI Conference on Artificial Intelligence, Austin, USA, 2015: 2267–2273. doi: 10.1609/aaai.v29i1.9513.
[15]	HUANG Lianzhe, MA Dehong, LI Sujian, et al. Text level graph neural network for text classification[C]. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019: 3444–3450. doi: 10.18653/v1/D19-1345.
[16]	XU Nan and MAO Wenji. MultiSentiNet: A deep semantic network for multimodal sentiment analysis[C]. The 2017 ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017: 2399–2402. doi: 10.1145/3132847.3133142.
[17]	SCHIFANELLA R, DE JUAN P, TETREAULT J, et al. Detecting sarcasm in multimodal social platforms[C]. The 24th ACM International Conference on Multimedia, Amsterdam, Netherlands, 2016: 1136–1145. doi: 10.1145/2964284.2964321.
[18]	XU Nan, ZENG Zhixiong, and MAO Wenji. Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association[C]. The 58th Annual Meeting of the Association for Computational Linguistics, 2020: 3777–3786. doi: 10.18653/v1/2020.acl-main.349.
[19]	LI Zhen, XU Bing, ZHU Conghui, et al. CLMLF: A contrastive learning and multi-layer fusion method for multimodal sentiment detection[C]. Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, USA, 2022: 2282–2294. doi: 10.18653/v1/2022.findings-naacl.175.
[20]	WEI Yiwei, YUAN Shaozu, YANG Ruosong, et al. Tackling modality heterogeneity with multi-view calibration network for multimodal sentiment detection[C]. The 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada, 2023: 5240–5252. doi: 10.18653/v1/2023.acl-long.287.
[21]	CHEN Yifan, LI Kuntao, MAI Weixing, et al. D²R: Dual-branch dynamic routing network for multimodal sentiment detection[C]. The 2024 Conference on Empirical Methods in Natural Language Processing, Miami, USA, 2024: 3536–3547. doi: 10.18653/v1/2024.emnlp-main.207.
[22]	余本功, 石中玉. 深层注意力和两阶段融合的图文情感对比学习方法[J]. 计算机工程与应用, 2025, 61(3): 223–233. doi: 10.3778/j.issn.1002-8331.2309-0470. YU Bengong and SHI Zhongyu. Deep attention and two-stage fusion of image-text sentiment contrastive learning method[J]. Computer Engineering and Applications, 2025, 61(3): 223–233. doi: 10.3778/j.issn.1002-8331.2309-0470.
[23]	卜韵阳, 卜凡亮, 张志江. 多通道交互下全局语义信息增强的多模态情感分析[J]. 计算机工程与应用, 2025, 61(19): 137–146. doi: 10.3778/j.issn.1002-8331.2406-0376. BU Yunyang, BU Fanliang, and ZHANG Zhijiang. Multimodal sentiment analysis of global semantic information enhancement under multi-channel interaction[J]. Computer Engineering and Applications, 2025, 61(19): 137–146. doi: 10.3778/j.issn.1002-8331.2406-0376.