Self-supervised Multimodal Emotion Recognition Combining Temporal Attention Mechanism and Unimodal Label Automatic Generation Strategy
-
摘要: 大多数多模态情感识别方法旨在寻求一种有效的融合机制,构建异构模态的特征,从而学习到具有语义一致性的特征表示。然而,这些方法通常忽略了模态间情感语义的差异性信息。为解决这一问题,提出了一种多任务学习框架,联合训练1个多模态任务和3个单模态任务,分别学习多模态特征间的情感语义一致性信息和各个模态所含情感语义的差异性信息。首先,为了学习情感语义一致性信息,提出了一种基于多层循环神经网络的时间注意力机制(TAM),通过赋予时间序列特征向量不同的权重来描述情感特征的贡献度。然后,针对多模态融合,在语义空间进行了逐语义维度的细粒度特征融合。其次,为了有效学习各个模态所含情感语义的差异性信息,提出了一种基于模态间特征向量相似度的自监督单模态标签自动生成策略(ULAG)。通过在CMU-MOSI, CMU-MOSEI, CH-SIMS 3个数据集上的大量实验结果证实,提出的TAM-ULAG模型具有很强的竞争力:在分类指标(Acc2, F1)和回归指标(MAE, Corr)上与基准模型的指标相比均有所提升;对于二分类识别准确率,在CMU-MOSI和CMU-MOSEI数据集上分别为87.2%和85.8%,而在CH-SIMS数据集上达到81.47%。这些研究结果表明, 同时学习多模态间的情感语义一致性信息和各模态情感语义的差异性信息,有助于提高自监督多模态情感识别方法的性能。Abstract: Most multimodal emotion recognition methods aim to find an effective fusion mechanism to construct the features from heterogeneous modalities, so as to learn the feature representation with semantic consistency. However, these methods usually ignore the emotionally semantic differences between modalities. To solve this problem, one multi-task learning framework is proposed. By training one multimodal task and three unimodal tasks jointly, the emotionally semantic consistency information among multimodal features and the emotionally semantic difference information contained in each modality are respectively learned. Firstly, in order to learn the emotionally semantic consistency information, one Temporal Attention Mechanism (TAM) based on a multilayer recurrent neural network is proposed. The contribution degree of emotional features is described by assigning different weights to time series feature vectors. Then, for multimodal fusion, the fine-grained feature fusion per semantic dimension is carried out in the semantic space. Secondly, one self-supervised Unimodal Label Automatic Generation (ULAG) strategy based on the inter-modal feature vector similarity is proposed in order to effectively learn the difference information of emotional semantics in each modality. A large number of experimental results on three datasets CMU-MOSI, CMU-MOSEI, CH-SIMS, confirm that the proposed TAM-ULAG model has strong competitiveness, and has improved the classification indices (Acc2, F1) and regression index (MAE, Corr) compared with the current benchmark models. For binary classification, the recognition rate is 87.2% and 85.8% on the CMU-MOSEI and CMU-MOSEI datasets, and 81.47% on the CH-SIMS dataset. The results show that simultaneously learning the emotionally semantic consistency information and the emotionally semantic difference information for each modality is helpful in improving the performance of self-supervised multimodal emotion recognition method.
-
1. 引言
有限元分析(Finite Element Analysis, FEA)是将连续介质离散成有限的单元,使无限自由度的问题转化成有限自由度的问题,再使用计算机进行计算[1]。在进行数据分析之前,需对待分析的物体做网格剖分,网格划分的质量直接影响到计算结果的好坏。一个高质量的网格剖分,要结合计算时间和工程精度要求,设定合理的网格数量和相对网格密度[2,3]。
有限元离散误差是评估有限元模型仿真精度的主要依据。误差估计方法可分为“前误差估计”和“后误差估计”[4],关于后误差估计方法前人已经做了大量的研究,如文献[5]基于极小化总势能和极大化应变能等原理建立了基本框架对离散误差进行评估。文献[6]针对线性椭圆问题用局部误差指标定义了最优网格,并给出了误差的界限。文献[7]在误差估计方面做了大量富有成效的研究,并提出采用能量范数度量两应力场间的差值表示离散误差,文献[8]证明了这种方法的有效性。文献[9]提出有限元法通常在保证收敛性时是按能量收敛。
在有限元分析中网格尺寸大小的确定是一个较大的问题。本文通过在高压输电环境中建立不同复杂度的3维人体模型;对人体模型进行自适应剖分和不断细化网格单元大小的手动剖分的研究,以能量误差(EnergyError)为判据指导模型的建立和最佳剖分尺寸的设置。
2. 网格剖分
2.1 有限元网格自适应剖分
自适应网格剖分是一种场量计算方法,在生成的初始网格的基础上,通过自动估计网格剖分误差和网格细分得到合理的网格[10,11]。它是通过循环实现网格加密,如图1所示。
2.2 有限元离散误差的基本理论
有限元法中可用范数进行误差度量,其中能量范数的误差度量法是指对模型中的每个单元进行能量误差估计[12]。误差分为局部误差和全局误差[13]。每个单元的误差称为局部误差,将所有的单元误差求和即得到全局误差。
对于线弹性力学问题,令
Ω 为有界域,Γ=Γt∪Γμ 为其边界条件,则基本方程可表示为Lμ−f=STDSμ−f=0 (1) 其中,
σ=DSμ,在Ω区内 (2) n⋅σ=t,在边界Γt上μ=μ,在边界Γμ上} (3) 其中,L为线性微分算子,
L=STDS ;μ 为位移;f为体积力矢量;D为弹性常数矩阵;S为应变微分矩阵;σ 为应力。将式(1)离散后,有限元方程为
Kμ=R (4) 式中,K为总刚度矩阵;R为载荷列向量。
有限元单元i的能量误差为
ei=12∫Ω(σ∗−ˆσ)T[D]−1(σ∗−ˆσ)dΩi (5) 式中,
Ωi 为单元i的有界域体积;[D]表示弹性矩阵;(σ∗−ˆσ) 为有限元应力误差,ˆσ 为应力近似解,σ∗ 为ˆσ 修匀得到的改进应力解。将所有单元能量误差相加得单元总能量误差为
e=Nr∑i=1ei (6) 式中,
Nr 为总的单元数。在有限元软件中,采用全域能量误差作为有限元计算控制标准,计算公式为
E=(eU+e)12×100% (7) 式中,
U 为单元总应变能;e 为单元总能量误差。本文中,以式(7)计算得的能量误差作为精度评估依据,研究输电线路有限元模型在不同网格剖分方案下的能量误差,从而选择合适的网格剖分尺寸。
3. 仿真模型的建立
有限元分析的核心思想是结构的离散化,即有限元网格剖分。它是有限元分析中至关重要的一步,直接影响计算结构的精确度[14]。由此可见,建立高压输电线路系统模型,并进行合理的网格剖分是高压输电线路有限元分析中最重要的一步。
3.1 输电线路有限元模型
本文根据国家电网公司配电网工程典型设计中的Z1-2单回直线水泥单杆杆头示意图(其中L为0.8 m),如图2(a)所示。采用Ansoft Maxwell仿真软件建立相应的输电线路有限元模型,如图2(b)所示。在Ansoft Maxwell中进行瞬态电场研究。
在无人情况下空间电场强度仿真结果分布图如图3所示,为了方便观察将立体图的场强最大值设置为30 kV/m,剖面图场强最大值设置为10 kV/m。从图中可以看出电场强度是从导线向四周不断减小的形式变化。
3.2 人体模型的建立
人体结构复杂,其内部含有大量可能受电场作用的带电粒子,这是因为人体内部有各种生物分子。在外加电场作用下,会使人体生物组织表现出传导性和介电性[15]。本文并不着眼于人体某个部位的具体分析,因此建立3种不同复杂程度3维简化人体模型,如图4所示。
其中人体模型的尺寸如表1所示。
表 1 人体模型尺寸(m)人体模型I 人体模型II 人体模型III 脚 身体 脖子 头 脚 身体 手臂 脖子 头 半径 0.28 0.19 0.24 0.10 0.17 0.05 0.12 0.03 0.04 0.20 高度 1.71 0.70 0.65 0.08 无 0.80 0.70 0.75 0.08 无 3.3 实际案例的有限元模型仿真
根据实际环境和对象参数,在Ansoft Maxwell中建立输电线路有限元模型。考虑到空间中人体可能带来的影响,加入人体模型进行仿真。仿真结果如图5所示。为了方便观察将立体图的场强最大值设置为102 kV/m,剖面图的场强最大值设置为10 kV/m。将仿真结果与图3对比,可知放入人体模型后空间电场的衰减比无人时更快。电场强度分布规律均由输电导线向四周不断减小,在人体附近空气场强要比无人体处的场强度大,在靠近导线一侧的人体头部尤为突出,这表明人体的加入使其周围空气中的电场发生严重畸变。这是因为人体的相对介电常数非常大,可近似为一接地导体,且感应电荷随人体与正极板距离的增加而减少。因此,在靠近导线一侧的人体头部离正极板较近,会聚集有大量感应电荷,场强增大;随着人体各个部分离正极板距离的增加,场强也会随之减小。
4. 网格剖分优化仿真
4.1 自适应网格剖分
对3种人体模型以身高为权重求半径的方差和进行自适应网格剖分仿真分析,得到不同人体模型的方差和能量误差,如表2所示。
表 2 不同人体模型的方差和自适应网格剖分能量误差人体模型类型 人体模型I 人体模型II 人体模型III 能量误差(%) 21.0735 22.4244 26.8348 方差(m2) 0 0.036 0.050 根据表2,可以拟合出能量误差随着人体模型方差的变化规律,如图6所示。
从图6中可知能量误差随着人体模型方差的增大而增大。因为人体模型方差越大表示建立的人体模型越复杂,在进行有限元网格剖分时模型越复杂产生的误差就越大。因此,可以根据人体模型能量误差来指导模型的选择。
4.2 手动网格剖分
对人体模型进行不同网格剖分设置,并仿真分析。从1.2节的有限元离散误差分析中可知,最大能量误差在一定程度上体现了计算结果的精度,因此本文选择最大的能量误差作为误差大小判断依据。本节仿真分析了3种人体模型进行手动网格剖分设置后的能量误差变化。如图7所示。从图中可知,电场中的能量误差随着计算部分网格剖分单元边长的增大而增大。
图7拟合的对数函数分别为
yI=0.928ln(x)+22.093yII=0.0875ln(x)+21.937yIII=0.3549ln(x)+24.625} (8) 分别对3个拟合的对数函数进行求导得
y′I=0.928xy′II=0.0875xy′III=0.3549x} (9) 从图7中可知能量误差越小,则模型的计算复杂度就越大,所占用的系统资源和分析时间越大。在实际中要结合计算时间、计算机内存和工程精度要求等因素,使得计算复杂度和能量误差达到一个均衡状态。因此,令
y′=1 为最佳网格剖分点,求得人体模型I、人体模型II和人体模型III的最佳剖分尺寸分别为0.928 m, 0.0875 m和0.3549 m。对3种模型选择相同的网格剖分尺寸,如选择网格尺寸为0.2 m和0.5 m。它们的网格数差,如表3所示。从表中看出,全局网格数变化及人体网格数变化最小的是人体模型II,其次为人体模型III,最后为人体模型I。使得人体模型II的仿真结果能量误差变化最小,其次为人体模型III,最后为人体模型I。从而求得的x为人体模型II最小,其次为人体模型III,最后为人体模型I。
表 3 网格尺寸0.2 m和0.5 m的模型网格数变化人体模型I 人体模型II 人体模型III 全局网格数 网格尺寸0.2 m 281086 282154 285858 网格尺寸0.5 m 277204 280179 283806 网格数差 全局网格数差 3882 1975 2052 人体网格数 网格尺寸0.2 m 2646 2769 4987 网格尺寸0.5 m 566 1579 3615 网格数差 人体网格数差 2080 1190 1372 5. 结束语
本文在分析了有限元网格自适应剖分及有限元离散误差的基础上,采用Ansoft Maxwell仿真软件,建立3种不同复杂程度的人体模型,并对3种人体模型进行自适应剖分和手动剖分的仿真,分析了能量误差的变化趋势,研究结果可用于对人体模型的选择提供参考和对不同模型的最佳网格剖分尺寸进行估计。本文的研究成果,对其它有限元网格剖分方案的研究也具有一定的参考价值。
-
图 3 基于距离中心的标签生成方法[27]
1 自监督单模态标签自动生成
输入:单模态输入It,Ia,Iv,多模态标签ymm 输出:Fnk, k∈{mm,t,a,v}, n代表训练次数 1:初始化模型参数M(θ;x) 2:初始化单模态标签y1t=ymm, y1a=ymm, y1v=ymm 3:初始化全局表征Fgt=0, Fga=0, Fgv=0, Fgmm=0 4:for 模型训练回合数 do 5: for mini-batch in dataLoader do 6: 通过时间注意力机制计算小批量模态表征Ft, Fa, Fv, Fmm 7: 计算多模态表征F*mm以及预测标签∧ymm 8: 通过损失函数式(23)来计算损失 9: 计算梯度∂L∂θ 10: 更新模型参数θ=θ−∂L∂θ 11: if n≠1 then 12: 生成相似度分数ct,ca,cv以及通过权重注意力生成单模
态标签ynt,yna,ynv13: end if 14:利用F∗s s∈{mm,t,a,v}更新全局表征Fgs 15: end for 16:end for 表 1 数据集的统计信息
表 2 模型的超参数设置
超参数 CMU-MOSI CMU-MOSEI CMU-SIMS Batch size 32 32 32 Number of Epochs 30 30 30 A-LSTM Dropout Rate 0.0 0.0 0.0 V-LSTM Dropout Rate 0.0 0.0 0.0 BERT Dropout Rate 0.1 0.1 0.1 V-LSTM隐藏层层数 64 32 64 A-LSTM隐藏层层数 32 32 16 表 3 不同模型对比结果CMU-MOSI数据
模型 Acc2 F1 MAE Corr Acc7 数据设置 TFN [9] 73.90 73.40 0.970 0.633 32.10 非对齐 RAVEN [32] 78.00 76.60 0.915 0.691 33.20 对齐 MCTN [33] 79.36 79.16 0.909 0.676 35.60 对齐 MuIT [34] 81.12 81.08 0.889 0.686 40.00 对齐 MMIM(B)[38] 84.14 84.00 0.700 0.800 46.65 非对齐 MAG-BERT(B)[35] 86.10 86.00 0.712 0.796 / 对齐 MHAI-BER(B)[36] 86.30 86.28 0.727 0.810 / 对齐 ConKI(B)[39] 84.37 84.33 0.681 0.816 48.43 非对齐 MTSA(B)[41] 86.80 86.80 0.696 0.806 46.40 非对齐 MISA(B)[37] 81.80 81.70 0.783 0.761 42.30 对齐 SUGRM(B)⊗[30] 84.40 84.30 0.703 0.800 / 非对齐 SaPIL(B)⊗ [42] 83.65 82.51 0.704 0.794 / 非对齐 TETFN(B)⊗[28] 84.05 83.83 0.717 0.800 / 非对齐 MTL-BAM(B)⊗[40] 85.36 85.37 0.703 0.798 / 非对齐 TPMSA(B)⊗[43] 87.00 87.00 0.704 0.799 / 非对齐 HIS-MSA(B)⊗[29] 86.01 85.99 0.671 0.819 48.40 非对齐 SELF-MM(B)⊗∙[27] 85.02 85.12 0.713 0.798 45.04 非对齐 TAM-ULAG(本文) 87.20 87.12 0.695 0.816 48.94 非对齐 表 4 不同模型对比结果CMU-MOSEI数据集
模型 Acc2 F1 MAE Corr Acc7 数据设置 TFN [9] 74.30 73.40 0.720 0.497 50.20 非对齐 RAVEN [32] 79.10 79.50 0.614 0.662 / 对齐 MCTN [33] 79.80 80.60 0.609 0.670 / 对齐 MuIT [34] 82.50 82.30 0.580 0.703 48.80 对齐 MMIM(B)[38] 82.50 82.39 0.577 0.716 52.78 非对齐 MAG-BERT(B)[35] 85.10 85.06 0.555 0.758 52.67 对齐 MHAI-BER(B)[36] 85.56 85.52 0.588 0.816 / 对齐 ConKI(B)[39] 82.73 83.08 0.529 0.782 54.25 非对齐 MTSA(B)[41] 82.59 82.67 0.568 0.724 / 非对齐 MISA(B)[37] 83.60 83.80 0.555 0.756 52.20 对齐 SUGRM(B)⊗[30] 83.70 83.60 0.544 0.748 / 非对齐 SaPIL(B)⊗ [42] 82.98 83.26 0.523 0.766 / 非对齐 TETFN(B)⊗[28] 84.25 84.18 0.551 0.748 / 非对齐 MTL-BAM(B)⊗[40] 84.61 84.71 0.548 0.761 / 非对齐 TPMSA(B)⊗[43] 85.60 85.60 0.542 0.770 / 非对齐 HIS-MSA(B)⊗[29] 83.96 83.89 0.510 0.786 54.90 非对齐 SELF-MM(B)⊗∙[27] 82.81 82.53 0.530 0.765 52.80 非对齐 TAM-ULAG(本文) 85.80 86.01 0.518 0.789 54.73 非对齐 表 5 不同模型对比结果CH-SIMS数据集
表 6 CMU-MOSI数据集上不同模块的消融实验结果
模型 MAE Corr Acc2 F1 Acc7 w/o BERT 0.713 0.791 84.75 84.81 46.81 w/o 时间注意力 0.704 0.798 85.23 85.25 47.77 w/o 多模态融合 0.701 0.801 84.30 84.21 46.03 w/o ULAG 0.719 0.789 84.72 84.81 47.92 Use ULGM[27] 0.704 0.810 86.98 86.78 48.46 Use SUGRM[30] 0.700 0.811 87.0 86.98 48.53 TAM-ULAG(本文) 0.695 0.816 87.2 87.12 48.94 表 7 CMU-MOSI数据集上不同模态配置的消融实验结果
模型 MAE Corr Acc2 F1 M 0.732 0.789 83.3 83.3 M, V 0.721 0.791 84.6 84.8 M, A 0.719 0.796 83.8 83.8 M, T 0.704 0.804 85.1 85.3 M, A, V 0.713 0.798 85.5 85.5 M, T, V 0.701 0.800 86.4 86.5 M, T, A 0.708 0.797 86.2 86.1 M, T, A, V 0.695 0.816 87.2 87.12 表 8 模型在不同数据集上的训练时间、测试时间和参数量的比较
模型 训练时间(ms/单个样本) 测试时间(ms/单个样本) 参数量 SELF-MM[27] CMU-MOSI 4.86 CMU-MOSI 1.57 109 689 220 CMU-MOSEI 5.33 CMU-MOSEI 1.98 CH-SIMS 5.22 CH-SIMS 1.46 MTL-BAM[40] CMU-MOSI 6.33 CMU-MOSI 2.23 112 544 660 CMU-MOSEI 7.18 CMU-MOSEI 2.71 CH-SIMS / CH-SIMS / TAM-ULAG(本文) CMU-MOSI 3.33 CMU-MOSI 1.19 109 694 858 CMU-MOSEI 4.53 CMU-MOSEI 1.43 CH-SIMS 4.21 CH-SIMS 1.23 表 9 CMU-MOSI数据集上的实例分析
示例 原始标签 文本标签 音频标签 视觉标签
例1–2.4 –2.47 –2.13 –1.75 例2 1.8 2.33 –0.55 –0.89 例3 1.8 2.1 1.2 –1.4 -
[1] 曾子明, 孙守强, 李青青. 基于融合策略的突发公共卫生事件网络舆情多模态负面情感识别[J]. 情报学报, 2023, 42(5): 611–622. doi: 10.3772/j.issn.1000-0135.2023.05.009.ZENG Ziming, SUN Shouqiang, and LI Qingqing. Multimodal negative sentiment recognition in online public opinion during public health emergencies based on fusion strategy[J]. Journal of the China Society for Scientific and Technical Information, 2023, 42(5): 611–622. doi: 10.3772/j.issn.1000-0135.2023.05.009. [2] 姚鸿勋, 邓伟洪, 刘洪海, 等. 情感计算与理解研究发展概述[J]. 中国图象图形学报, 2022, 27(6): 2008–2035. doi: 10.11834/jig.220085.YAO Hongxun, DENG Weihong, LIU Honghai, et al. An overview of research development of affective computing and understanding[J]. Journal of Image and Graphics, 2022, 27(6): 2008–2035. doi: 10.11834/jig.220085. [3] KUMAR P and RAMAN B. A BERT based dual-channel explainable text emotion recognition system[J]. Neural Networks, 2022, 150: 392–407. doi: 10.1016/j.neunet.2022.03.017. [4] 曾义夫, 蓝天, 吴祖峰, 等. 基于双记忆注意力的方面级别情感分类模型[J]. 计算机学报, 2019, 42(8): 1845–1857. doi: 10.11897/SP.J.1016.2019.01845.ZENG Yifu, LAN Tian, WU Zufeng, et al. Bi-memory based attention model for aspect level sentiment classification[J]. Chinese Journal of Computers, 2019, 42(8): 1845–1857. doi: 10.11897/SP.J.1016.2019.01845. [5] BAKHSHI A, HARIMI A, and CHALUP S. CyTex: Transforming speech to textured images for speech emotion recognition[J]. Speech Communication, 2022, 139: 62–75. doi: 10.1016/j.specom.2022.02.007. [6] 黄程韦, 赵艳, 金赟, 等. 实用语音情感的特征分析与识别的研究[J]. 电子与信息学报, 2011, 33(1): 112–116. doi: 10.3724/SP.J.1146.2009.00886.HUANG Chengwei, ZHAO Yan, JIN Yun, et al. A study on feature analysis and recognition of practical speech emotion[J]. Journal of Electronics & Information Technology, 2011, 33(1): 112–116. doi: 10.3724/SP.J.1146.2009.00886. [7] 杨杨, 詹德川, 姜远, 等. 可靠多模态学习综述[J]. 软件学报, 2021, 32(4): 1067–1081. doi: 10.13328/j.cnki.jos.0061670.YANG Yang, ZHAN Dechuan, JIANG Yuan, et al. Reliable multi-modal learning: A survey[J]. Journal of Software, 2021, 32(4): 1067–1081. doi: 10.13328/j.cnki.jos.0061670. [8] 韩卓群. 基于多模态融合的情感识别技术研究与实现[D]. [硕士论文], 山东大学, 2022. doi: 10.27272/d.cnki.gshdu.2022.004451.HAN Zhuoqun. Research and implementation of emotion recognition technology based on multimodal fusion[D]. [Master dissertation], Shandong University, 2022. doi: 10.27272/d.cnki.gshdu.2022.004451. [9] ZADEH A, CHEN Minghai, PORIA S, et al. Tensor fusion network for multimodal sentiment analysis[C]. Proceedings of 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 2017: 1103–1114. doi: 10.18653/v1/D17-1115. [10] LIU Zhun, SHEN Ying, LAKSHMINARASIMHAN V B, et al. Efficient low-rank multimodal fusion with modality-specific factors[C]. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 2018: 2247–2256. doi: 10.18653/v1/P18-1209. [11] MAI Sijie, HU Haifeng, and XING Songlong. Modality to modality translation: An adversarial representation learning and graph fusion network for multimodal fusion[C]. Proceedings of the 34th AAAI Conference on Artificial Intelligence, New York, USA, 2020: 164–172. doi: 10.1609/aaai.v34i01.5347. [12] WU Yang, LIN Zijie, ZHAO Yanyan, et al. A text-centered shared-private framework via cross-modal prediction for multimodal sentiment analysis[C]. Proceedings of the Findings of the Association for Computational Linguistics, Virtual, 2021: 4730–4738. doi: 10.18653/v1/2021.findings-acl.417. [13] YU Wenmeng, XU Hua, MENG Fanyang, et al. CH-SIMS: A Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality[C]. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual, 2020: 3718–3727. doi: 10.18653/v1/2020.acl-main.343. [14] ZADEH A A B, LIANG P P, PORIA S, et al. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph[C]. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 2018: 2236–2246. doi: 10.18653/v1/P18-1208. [15] BALTRUŠAITIS T, AHUJA C, and MORENCY L P. Multimodal machine learning: A survey and taxonomy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(2): 423–443. doi: 10.1109/TPAMI.2018.2798607. [16] 赵小明, 杨轶娇, 张石清. 面向深度学习的多模态情感识别研究进展[J]. 计算机科学与探索, 2022, 16(7): 1479–1503. doi: 10.3778/j.issn.1673-9418.2112081.ZHAO Xiaoming, YANG Yijiao, and ZHANG Shiqing. Survey of deep learning based multimodal emotion recognition[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(7): 1479–1503. doi: 10.3778/j.issn.1673-9418.2112081. [17] KAUR R and KAUTISH S. Multimodal sentiment analysis: A survey and comparison[M]. Information Resources Management Association. Research Anthology on Implementing Sentiment Analysis Across Multiple Disciplines. IGI Global, 2022: 1846–1870. doi: 10.4018/978-1-6684-6303-1.ch098. [18] KAUR R and KAUTISH S. Multimodal sentiment analysis: A survey and comparison[J]. International Journal of Service Science Management Engineering and Technology, 2019, 10(2): 38–58. doi: 10.4018/IJSSMET.2019040103. [19] ZHANG J, XING L, TAN Z, et al. Multi-head attention fusion networks for multi-modal speech emotion recognition[J]. Computers & Industrial Engineering, 2022, 168: 108078. doi: doi: 10.1016/j.cie.2022.108078. [20] GHOSAL D, AKHTAR S, CHAUHAN D, et al. Contextual inter-modal attention for multi-modal sentiment analysis[C]. Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018: 3454–3466. doi: 10.18653/v1/D18-1382. [21] MAI Sijie, HU Haifeng, and XING Songlong. Divide, conquer and combine: Hierarchical feature fusion network with local and global perspectives for multimodal affective computing[C]. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019: 481–492. doi: 10.18653/v1/P19-1046. [22] MAI Sijie, XING Songlong, and HU Haifeng. Locally confined modality fusion network with a global perspective for multimodal human affective computing[J]. IEEE Transactions on Multimedia, 2020, 22(1): 122–137. doi: 10.1109/TMM.2019.2925966. [23] HE Jiaxuan, MAI Sijie, and HU Haifeng. A unimodal reinforced transformer with time squeeze fusion for multimodal sentiment analysis[J]. IEEE Signal Processing Letters, 2021, 28: 992–996. doi: 10.1109/LSP.2021.3078074. [24] 王汝言, 陶中原, 赵容剑, 等. 多交互图卷积网络用于方面情感分析[J]. 电子与信息学报, 2022, 44(3): 1111–1118. doi: 10.11999/JEIT210459.WANG Ruyan, TAO Zhongyuan, ZHAO Rongjian, et al. Multi-interaction graph convolutional networks for aspect-level sentiment analysis[J]. Journal of Electronics & Information Technology, 2022, 44(3): 1111–1118. doi: 10.11999/JEIT210459. [25] DEVLIN J, CHANG Mingwei, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]. Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota, 2019: 4171–4186. doi: 10.18653/v1/N19-1423. [26] SIRIWARDHANA S, KALUARACHCHI T, BILLINGHURST M, et al. Multimodal emotion recognition with transformer-based self supervised feature fusion[J]. IEEE Access, 2020, 8: 176274–176285. doi: 10.1109/ACCESS.2020.3026823. [27] YU Wenmeng, XU Hua, YUAN Ziqi, et al. Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis[C]. Proceedings of the 35th AAAI Conference on Artificial Intelligence, Virtual, 2021: 10790–10797. doi: 10.1609/aaai.v35i12.17289. [28] WANG Di, GUO Xutong, TIAN Yumin, et al. TETFN: A text enhanced transformer fusion network for multimodal sentiment analysis[J]. Pattern Recognition, 2023, 136: 109259. doi: 10.1016/j.patcog.2022.109259. [29] ZENG Yufei, LI Zhixin, TANG Zhenjun, et al. Heterogeneous graph convolution based on In-domain Self-supervision for Multimodal Sentiment Analysis[J]. Expert Systems with Applications, 2023, 213: 119240. doi: 10.1016/j.eswa.2022.119240. [30] HWANG Y and KIM J H. Self-supervised unimodal label generation strategy using recalibrated modality representations for multimodal sentiment analysis[C]. Proceedings of the Findings of the Association for Computational Linguistics, Dubrovnik, Croatia, 2023: 35–46. doi: 10.18653/v1/2023.findings-eacl.2. [31] CHOI E, BAHADORI M T, KULAS J A, et al. RETAIN: An interpretable predictive model for healthcare using reverse time attention mechanism[C]. Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 3512–3520. [32] WANG Yansen, SHEN Ying, LIU Zhun, et al. Words can shift: Dynamically adjusting word representations using nonverbal behaviors[C]. Proceedings of 33rd AAAI Conference on Artificial Intelligence, Honolulu, USA, 2019: 7216–7223. doi: 10.1609/aaai.v33i01.33017216. [33] PHAM H, LIANG P P, MANZINI T, et al. Found in translation: Learning robust joint representations by cyclic translations between modalities[C]. Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, USA, 2019: 6892–6899. doi: 10.1609/aaai.v33i01.33016892. [34] TSAI Y H H, BAI Shaojie, LIANG P P, et al. Multimodal transformer for unaligned multimodal language sequences[C]. Proceedings of the 57th Association for Computational Linguistics, Florence, Italy, 2019: 6558. doi: 10.18653/v1/P19-1656. [35] RAHMAN W, HASAN K, LEE S, et al. Integrating multimodal information in large pretrained transformers[C]. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual, 2020: 2359–2369. doi: 10.18653/v1/2020.acl-main.214. [36] ZHAO Xianbing, CHEN Yixin, CHEN Yiting, et al. HMAI-BERT: Hierarchical multimodal alignment and interaction network-enhanced BERT for multimodal sentiment analysis[C]. Proceedings of 2022 IEEE International Conference on Multimedia and Expo, Taipei, China, 2022: 1–6. doi: 10.1109/ICME52920.2022.9859747. [37] HAZARIKA D, ZIMMERMANN R, and PORIA S. MISA: Modality-invariant and-specific representations for multimodal sentiment analysis[C]. Proceedings of the 28th ACM International Conference on Multimedia, Seattle, United States, 2020: 1122–1131. doi: 10.1145/3394171.3413678. [38] HAN Wei, CHEN Hui, and PORIA S. Improving multimodal fusion with hierarchical mutual information maximization for multimodal sentiment analysis[C]. Proceedings of Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 2021: 9180–9192. doi: 10.18653/v1/2021.emnlp-main.723. [39] YU Yakun, ZHAO Mingjun, QI Shiang, et al. ConKI: Contrastive knowledge injection for multimodal sentiment analysis[C]. Proceedings of the Findings of the Association for Computational Linguistics, Toronto, Canada, 2023: 13610–13624. doi: 10.18653/v1/2023.findings-acl.860. [40] XIE Jinbao, WANG Jiyu, WANG Qingyan, et al. A multimodal fusion emotion recognition method based on multitask learning and attention mechanism[J]. Neurocomputing, 2023, 556: 126649. doi: 10.1016/j.neucom.2023.126649. [41] YANG Bo, SHAO Bo, WU Lijun, et al. Multimodal sentiment analysis with unidirectional modality translation[J]. Neurocomputing, 2022, 467: 130–137. doi: 10.1016/j.neucom.2021.09.041. [42] LAI Songning, HU Xifeng, LI Yulong, et al. Shared and private information learning in multimodal sentiment analysis with deep modal alignment and self-supervised multi-task learning[J]. arXiv preprint arXiv: 2305.08473, 2023. [43] YANG Bo, WU Lijun, ZHU Jinhua, et al. Multimodal sentiment analysis with two-phase multi-task learning[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022, 30: 2015–2024. doi: 10.1109/TASLP.2022.3178204. 期刊类型引用(2)
1. 程姣,王利刚,高文斌,陶婷. 基于元分析模型的师生动机感染仿真研究. 现代电子技术. 2022(15): 109-112 . 百度学术
2. 闻小龙,杨鹏飞,储昭志,彭春荣,刘宇涛,吴双. 基于MEMS的距离自适应型非接触静电仪. 电子与信息学报. 2021(10): 3068-3074 . 本站查看
其他类型引用(2)
-