高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

多源知识引导的视觉置信度感知的多模态情感分析模型

彭菊红 张智 刘朋 葛文慧 柳陈 廖凌鑫 张凯

彭菊红, 张智, 刘朋, 葛文慧, 柳陈, 廖凌鑫, 张凯. 多源知识引导的视觉置信度感知的多模态情感分析模型[J]. 电子与信息学报. doi: 10.11999/JEIT260063
引用本文: 彭菊红, 张智, 刘朋, 葛文慧, 柳陈, 廖凌鑫, 张凯. 多源知识引导的视觉置信度感知的多模态情感分析模型[J]. 电子与信息学报. doi: 10.11999/JEIT260063
PENG Juhong, ZHANG Zhi, LIU Peng, GE Wenhui, LIU Chen, LIAO Lingxin, ZHANG Kai. A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260063
Citation: PENG Juhong, ZHANG Zhi, LIU Peng, GE Wenhui, LIU Chen, LIAO Lingxin, ZHANG Kai. A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT260063

多源知识引导的视觉置信度感知的多模态情感分析模型

doi: 10.11999/JEIT260063 cstr: 32379.14.JEIT260063
基金项目: 国家自然科学基金(62377009)
详细信息
    作者简介:

    彭菊红:女,副教授,研究方向为信息处理及人工智能算法研究

    张智:男,硕士生,研究方向为多模态情感识别

    刘朋:男,硕士生,研究方向为多模态融合算法及目标检测

    葛文慧:女,硕士生,研究方向为多模态识别与跟踪

    柳陈:女,硕士生,研究方向为多模态识别与跟踪

    廖凌鑫:男,硕士生,研究方向为多模态情感识别

    张凯:男,高级工程师,研究方向为自动化信息集成及优化算法研究

    通讯作者:

    张凯 29859491@qq.com

  • 中图分类号: TN911.7; TP391

A Multimodal Sentiment Analysis Model with Multi-source Knowledge guided Visual Confidence Perception

Funds: The National Natural Science Foundation of China (62377009)
  • 摘要: 针对多模态情感分析中图文不一致、视觉模态置信度低、模态贡献不均衡等问题,该文提出一种多源知识引导的视觉置信度感知模型(MKVP)。首先,通过多源知识引导构建视觉置信度感知(VCP)模块,利用文本句法与细粒度属性先验对视觉特征进行质量评估,有效过滤图像中受环境干扰的冗余信息,并引导其特征分布。其次,为避免模型对文本模态产生过度依赖并平衡模态贡献,设计双流并行交互模块,通过跨模态注意力机制促进图文特征的深层对等交互,强化图像特征对文本语义的补充与修正作用。最后,引入全局门控融合机制,根据各模态的全局贡献程度动态调节融合权重,实现从单模态主导向多模态均衡协同决策的转变。在MVSA-Single, MVSA-Multiple及HFM数据集上识别准确率和F1分数分别达到了77.56%和76.70%、72.72%和70.66%、87.26%和86.78%,对比基线模型识别准确率和F1分数分别提升2.45%和3.68%、2.19%和2.21%、1.83%和1.91%。说明该模型能有效挖掘样本中图文之间更深层次的情感表达。
  • 图  1  MKVP模型框架

    图  2  多源知识引导矩阵构建图

    图  3  VCP模块结构图

    图  4  HFM经过VCP模块前后的t-SNE可视化图

    图  5  MVSA-Single数据集联合损失超参数敏感性分析

    表  1  数据集的统计信息

    TrainValTest总计
    MVSA-Single36114504504511
    MVSA-Multiple136241700170017024
    HFM198162410240924635
    下载: 导出CSV

    表  2  实验参数设置

    参数 MVSA-Single MVSA-Multiple HFM
    批量大小 32 16 32
    学习率 5E-5 2E-5 5E-5
    迭代轮次 40
    优化器 AdamW
    嵌入维度 768
    $ {\gamma }_{1} $, $ {\gamma }_{2} $, $ {\gamma }_{3} $ 1.0, 1.0, 1.0 1.0, 2.0, 1.0 1.0, 1.0, 1.0
    Dropout 0.3 0.5 0.3
    下载: 导出CSV

    表  3  MKVP与所有基线模型在3个数据集上的对比结果

    形式 模型 MVSA-Single MVSA-Multiple 模型 HFM
    Acc F1 Acc F1 Acc F1
    文本 CNN 0.6819 0.5590 0.6564 0.5766 CNN 0.8003 0.7572
    Bi-LSTM 0.7012 0.6506 0.6790 0.6790 Bi-LSTM 0.8190 0.7753
    BERT 0.7111 0.6970 0.6759 0.6624 BERT 0.8339 0.8326
    BiACNN 0.7036 0.6916 0.6847 0.6319
    TGNN 0.7034 0.6594 0.6967 0.6180
    图像 ResNet-50 0.6467 0.6155 0.6188 0.6098 ResNet-50 0.7277 0.7138
    ViT 0.6378 0.6226 0.6194 0.6119 Vit 0.7309 0.7152
    文本+图像 MultiSentiNet 0.6984 0.6984 0.6886 0.6811 Concat(3) 0.8174 0.7874
    MGNNS 0.7377 0.7270 0.7249 0.6934 D&R Net 0.8402 0.8060
    CLMLF 0.7511 0.7302 0.7053 0.6845 CLMLF 0.8543 0.8487
    GIGNN 0.7511 0.7333 0.7341 0.7096 GIGNN 0.8556 0.8487
    DIB 0.7605 0.7520
    MVCN 0.7606 0.7455 0.7207 0.7001 MVCN 0.8568 0.8523
    MFGFN 0.7622 0.7538 0.7082 0.6994
    D2R 0.7667 0.7559 0.7159 0.7085 D2R 0.8672 0.8625
    DTN 0.7711 0.7646 0.7070 0.6810 DTN 0.8697 0.8646
    MIGSIE 0.7640 0.7520 0.7272 0.7272
    MKVP 0.7756 0.7670 0.7272 0.7066 MKVP 0.8726 0.8678
    下载: 导出CSV

    表  4  多源知识注入位置对比结果

    MVSA-SingleMVSA-MultipleHFM
    AccF1AccF1AccF1
    MKVP-II0.73560.71280.71650.68660.87010.8651
    MKVP-IS0.72890.70290.72000.69420.86510.8600
    MKVP-GL0.75110.73390.71590.67950.86220.8557
    MKVP-LF0.74890.73350.71650.67520.85520.8501
    MKVP0.77560.76690.72720.70660.87260.8678
    下载: 导出CSV

    表  5  消融实验结果

    MVSA-SingleMVSA-MultipleHFM
    AccF1AccF1AccF1
    w/o VCP0.75220.73550.70780.68790.85640.8510
    w/o JOL0.75890.74280.71350.68900.86690.8523
    w/o CMI0.76110.74940.71060.69450.85970.8561
    w/o GMF0.76670.75580.71820.69870.86220.8552
    w/o V-J0.75110.74010.70060.68330.84970.8446
    w/o V- J -C0.74670.73610.70880.68140.84600.8413
    w/o V-J-C-G0.74220.73540.69170.67100.84470.8394
    MKVP0.77560.76690.72720.70660.87260.8678
    下载: 导出CSV

    表  6  案例对比结果

    图像 图像标签 文本 文本标签 ResNet BERT MKVP-VCP CLMLF MKVP
    Pos Harshad’s second Missionn ? @har1603 what did you do??? #appalled Neu Pos Neg Neu Neu Pos
    Neu RT @crashspain: Wonderful Turner Field Tour today. So excited for baseball season. Thanks @Braves @BravesReddit Pos Pos Pos Pos Pos Pos
    Neg #abandoned #ruins #haikyo #urbex Neu Neu Neu Neg Neg Neg
    Neu RT@AUFAMILY: Good wins over evil as there are once again two lives oaks at Toomer’s Corner. War Eagle! #ToomersForever Neg Pos Pos Pos Pos Neg
    下载: 导出CSV

    表  7  文本抗噪声实验结果

    噪声类型 噪声强度(%) MVSA-Multiple HFM
    Acc F1 Acc F1
    Shuffle 10 0.7124 0.6820 0.8618 0.8622
    30 0.7065 0.6769 0.8502 0.8511
    50 0.6924 0.6620 0.8419 0.8428
    无噪声 - 0.7272 0.7066 0.8726 0.8678
    下载: 导出CSV

    表  8  模型复杂度与效率计算结果

    Params(M)FLOPs(G)Time(ms)MVSA-Single
    MGNNS73.7848.4114.240.7377
    CLMLF205.5224.079.460.7511
    D2R345.5425.3936.210.7667
    MKVP175.1122.4813.660.7756
    下载: 导出CSV
  • [1] YUAN Yuan, LI Zhaojian, and ZHAO Bin. A survey of multimodal learning: Methods, applications, and future[J]. ACM Computing Surveys, 2025, 57(7): 167. doi: 10.1145/3713070.
    [2] LU Ming, DONG Zhiqiang, GUO Ziming, et al. A multi-modal sarcasm detection model based on cue learning[J]. Scientific Reports, 2025, 15(1): 10261. doi: 10.1038/s41598-025-94266-w.
    [3] ZHAO Kai, ZHENG Mingsheng, LI Qingguan, et al. Multimodal sentiment analysis—a comprehensive survey from a fusion methods perspective[J]. IEEE Access, 2025, 13: 64556–64583. doi: 10.1109/ACCESS.2025.3554665.
    [4] LIU Xinjing, LI Ruifan, YE Shuqin, et al. Multimodal aspect-based sentiment analysis under conditional relation[C]. The 31st International Conference on Computational Linguistics, Abu Dhabi, UAE, 2025: 313–323.
    [5] YU Bengong, LI Chenyue, and SHI Zhongyu. Multi-grained feature gating fusion network for multimodal sentiment analysis[J]. Knowledge and Information Systems, 2025, 67(8): 6879–6905. doi: 10.1007/s10115-025-02446-x.
    [6] HUANG Huiting, GONG Tieliang, HE Kai, et al. Robust multimodal sentiment analysis via double information bottleneck[J]. Information Fusion, 2026, 129: 103964. doi: 10.1016/j.inffus.2025.103964.
    [7] 胡泽, 陈志南, 杨宏宇. 多源特征融合增强的虚假新闻检测方法[J]. 电子与信息学报, 2025, 47(8): 2919–2934. doi: 10.11999/JEIT250041.

    HU Ze, CHEN Zhinan, and YANG Hongyu. A fake news detection approach enhanced by multi-source feature fusion[J]. Journal of Electronics & Information Technology, 2025, 47(8): 2919–2934. doi: 10.11999/JEIT250041.
    [8] ZI Lingling, PAN Xiangkai, and CONG Xin. MFSC: A multimodal aspect-level sentiment classification framework with multi-image gate and fusion networks[J]. Electronics, 2024, 13(12): 2349. doi: 10.3390/electronics13122349.
    [9] YANG Xiaocui, FENG Shi, ZHANG Yifei, et al. Multimodal sentiment detection based on multi-channel graph neural networks[C]. The 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021: 328–339. doi: 10.18653/v1/2021.acl-long.28.
    [10] WANG Hongbin, REN Chun, and YU Zhengtao. Multimodal sentiment analysis based on cross-instance graph neural networks[J]. Applied Intelligence, 2024, 54(4): 3403–3416. doi: 10.1007/s10489-024-05309-0.
    [11] ZHONG Qihuang, DING Liang, LIU Juhua, et al. Knowledge graph augmented network towards multiview representation learning for aspect-based sentiment analysis[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(10): 10098–10111. doi: 10.1109/TKDE.2023.3250499.
    [12] KIM Y. Convolutional neural networks for sentence classification[C]. The 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 2014: 1746–1751. doi: 10.3115/v1/D14-1181.
    [13] ZHOU Peng, SHI Wei, TIAN Jun, et al. Attention-based bidirectional long short-term memory networks for relation classification[C]. The 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany, 2016: 207–212. doi: 10.18653/v1/P16-2034.
    [14] LAI Siwei, XU Liheng, LIU Kang, et al. Recurrent convolutional neural networks for text classification[C]. The 29th AAAI Conference on Artificial Intelligence, Austin, USA, 2015: 2267–2273. doi: 10.1609/aaai.v29i1.9513.
    [15] HUANG Lianzhe, MA Dehong, LI Sujian, et al. Text level graph neural network for text classification[C]. The 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 2019: 3444–3450. doi: 10.18653/v1/D19-1345.
    [16] XU Nan and MAO Wenji. MultiSentiNet: A deep semantic network for multimodal sentiment analysis[C]. The 2017 ACM International Conference on Information and Knowledge Management, Singapore, Singapore, 2017: 2399–2402. doi: 10.1145/3132847.3133142.
    [17] SCHIFANELLA R, DE JUAN P, TETREAULT J, et al. Detecting sarcasm in multimodal social platforms[C]. The 24th ACM International Conference on Multimedia, Amsterdam, Netherlands, 2016: 1136–1145. doi: 10.1145/2964284.2964321.
    [18] XU Nan, ZENG Zhixiong, and MAO Wenji. Reasoning with multimodal sarcastic tweets via modeling cross-modality contrast and semantic association[C]. The 58th Annual Meeting of the Association for Computational Linguistics, 2020: 3777–3786. doi: 10.18653/v1/2020.acl-main.349.
    [19] LI Zhen, XU Bing, ZHU Conghui, et al. CLMLF: A contrastive learning and multi-layer fusion method for multimodal sentiment detection[C]. Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, USA, 2022: 2282–2294. doi: 10.18653/v1/2022.findings-naacl.175.
    [20] WEI Yiwei, YUAN Shaozu, YANG Ruosong, et al. Tackling modality heterogeneity with multi-view calibration network for multimodal sentiment detection[C]. The 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, Canada, 2023: 5240–5252. doi: 10.18653/v1/2023.acl-long.287.
    [21] CHEN Yifan, LI Kuntao, MAI Weixing, et al. D2R: Dual-branch dynamic routing network for multimodal sentiment detection[C]. The 2024 Conference on Empirical Methods in Natural Language Processing, Miami, USA, 2024: 3536–3547. doi: 10.18653/v1/2024.emnlp-main.207.
    [22] 余本功, 石中玉. 深层注意力和两阶段融合的图文情感对比学习方法[J]. 计算机工程与应用, 2025, 61(3): 223–233. doi: 10.3778/j.issn.1002-8331.2309-0470.

    YU Bengong and SHI Zhongyu. Deep attention and two-stage fusion of image-text sentiment contrastive learning method[J]. Computer Engineering and Applications, 2025, 61(3): 223–233. doi: 10.3778/j.issn.1002-8331.2309-0470.
    [23] 卜韵阳, 卜凡亮, 张志江. 多通道交互下全局语义信息增强的多模态情感分析[J]. 计算机工程与应用, 2025, 61(19): 137–146. doi: 10.3778/j.issn.1002-8331.2406-0376.

    BU Yunyang, BU Fanliang, and ZHANG Zhijiang. Multimodal sentiment analysis of global semantic information enhancement under multi-channel interaction[J]. Computer Engineering and Applications, 2025, 61(19): 137–146. doi: 10.3778/j.issn.1002-8331.2406-0376.
  • 加载中
图(5) / 表(8)
计量
  • 文章访问数:  20
  • HTML全文浏览量:  4
  • PDF下载量:  2
  • 被引次数: 0
出版历程
  • 收稿日期:  2026-01-20
  • 修回日期:  2026-04-20
  • 录用日期:  2026-04-23
  • 网络出版日期:  2026-05-13

目录

    /

    返回文章
    返回