Cognitive Emotional Interaction Model of Robot Based on Reinforcement Learning
-
摘要: 为增强机器人的认知情感计算能力,依据PAD情感空间建立结合即时反馈和长期趋势的机器人认知情感生成方法,该文提出一种基于强化学习的机器人认知情感交互模型。首先,依据人际交往心理学理论,模拟人类情感生成过程进行类人情感生成,并从中提取相似性、积极性、共情性3个影响因素;其次,利用强化学习的全局统筹特性,建立响应情感状态与上下文长期情感状态之间的关联关系,从而对机器人情感生成过程进行建模;然后,将3个因素纳入模型奖励机制用于交互情感状态评估,实现模型更新并得到最优情感策略;最后,利用所得最优情感策略对应的最优情感状态对机器人情感状态转移概率进行更新,并依据6种基本情感状态在空间中的情感值,将其映射到连续情感空间中得到机器人的最优响应情感值。主客观对比实验表明,该文模型能有效增加机器人情感表达的细腻性、连续性、积极性以及共情性,还能有效降低机器人对外界情感刺激的依赖性,进一步提升和谐友好的人机交互关系。Abstract: In order to enhance the cognitive emotional computing ability of robot, a cognitive emotional interaction model of robot based on reinforcement learning is proposed, which combines immediate feedback and long-term trend according to PAD(Pleasure-Arousal-Dominance) emotional space. Firstly, according to the psychology theory of interpersonal communication, the human emotion generation process is simulated to generate human-like emotions, and the three influencing factors of similarity, positivity and empathy are extracted. Secondly, the relationship between the response emotion+ state and the contexted long-term emotion state is established by using the global co-ordination feature of reinforcement learning, so as to model the robot emotion generation process. Then, three factors are incorporated into the model reward mechanism for the evaluate of the interactive emotion state, to update the model and get the optimal emotional strategy. Finally, the optimal emotional state corresponding to the obtained optimal emotional strategy is used to update the robot's emotional state transition probability, and based on the sentiment values of the six basic emotional states in space, them are mapped to continuous emotional space to get the optimal response emotional value of the robot. Subjective and objective comparison experiments show that the model in this paper can effectively increase the delicateness, continuity, positivity and empathy of the robot's emotional expression, and can effectively reduce the robot's dependence on external emotional stimuli, further improving the harmonious and friendly human-computer interaction.
-
表 1 基于强化学习的机器人认知情感交互模型
输入:$k$轮交互参与人的输入情感值$E_{{\rm{HR}}}^k$;$k - 1$轮交互机器人的情感状态转移概率$P_{{\rm{RH}}}^{k - 1}$; 输出:$k + 1$轮交互机器人的响应情感值$E_{{\rm{RH}}}^{k{\rm{ + }}1}$; 开始: 参与人输入交互情感${{E}}_{{\rm{HR}}}^k$; 根据式(1)—式(3)对${{E}}_{{\rm{HR}}}^k$进行情感状态评估得到${{I}}({{E}}_{{\rm{HR}}}^k)$; 根据式(4)—式(7)计算在当前情感状态${{I}}({{E}}_{{\rm{HR}}}^k)$下,机器人执行某一动作产生下一情感状态时可获得的奖励值$R(a|{{I}}({{E}}_{{\rm{HR}}}^k ))$; 根据式(8)—式(10)对模型进行训练,通过实现累积奖励值期望最大化来获得最优参数,从而得到最优情感策略$p$; 通过所选最优情感策略$p$,依据式(11)对交互输入响应情感状态转移概率$P({{{E}}_{k{\rm{ + }}1}}|{{E}}_{{\rm{HR}}}^k)$进行计算; 通过获得的${{P}}({{{E}}_{k{\rm{ + }}1}}|{{E}}_{{\rm{HR}}}^k)$,依据式(12)—式(13)对机器人$k - 1$轮交互情感状态转移概率${{P}}_{{\rm{RH}}}^{k - 1}$进行更新,得到$k + 1$轮交互机器人的情感状态转移概率${{P}}_{{\rm{RH}}}^{k + 1}$; 通过获得的${{P}}_{{\rm{RH}}}^{k + 1}$,依据式(14)—式(15)对最优响应情感值${{E}}_{{\rm{RH}}}^{k{\rm{ + }}1}$的空间坐标位置进行标定,实现机器人在连续情感空间中的状态转移; 机器人输出响应情感${{E}}_{{\rm{RH}}}^{k{\rm{ + }}1}$; 令$k = k + 2$; 直到参与人停止交互输入 人机交互结束 表 2 不同模型情感准确度统计表
认知模型 准确度 ECM 0.775 GCRs 0.831 E-SCBA 0.792 SentiGAN 0.846 本文 0.865 表 3 不同模型排序准确率统计表
认知模型 MAP MRR Chatterbot 0.466 0.438 ECM 0.608 0.587 GCRs 0.641 0.623 E-SCBA 0.634 0.625 SentiGAN 0.637 0.628 本文 0.657 0.646 表 4 会话轮数与交互时间统计表
认知模型 N(轮) T(s) Chatterbot 6 68.56 ECM 9 97.30 GCRs 13 132.84 E-SCBA 11 114.57 SentiGAN 10 107.29 本文 15 145.41 -
[1] ZUCCO C, CALABRESE B, and CANNATARO M. Sentiment analysis and affective computing for depression monitoring[C]. 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), Kansas City, USA, 2017: 1988–1995. doi: 10.1109/BIBM.2017.8217966. [2] 杨勇, 张志瑜. 基于PAD的个性化情感模型[J]. 重庆邮电大学学报: 自然科学版, 2012, 24(1): 96–103. doi: 10.3979/j.issn.1673-825X.2012.01.019YANG Yong and ZHANG Zhiyu. Personalized affective model based on PAD[J]. Journal of Chongqing University of Posts and Telecommunications:Natural Science Edition, 2012, 24(1): 96–103. doi: 10.3979/j.issn.1673-825X.2012.01.019 [3] WU Dapeng, HAN Xiaojuan, YANG Zhigang, et al. Exploiting transfer learning for emotion recognition under cloud-edge-client collaborations[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(2): 479–490. doi: 10.1109/JSAC.2020.3020677 [4] 丁永刚, 李石君, 付星, 等. 面向时序感知的多类别商品方面情感分析推荐模型[J]. 电子与信息学报, 2018, 40(6): 1453–1460. doi: 10.11999/JEIT170938DING Yonggang, LI Shijun, FU Xing, et al. Temporal-aware multi-category products recommendation model based on aspect-level sentiment analysis[J]. Journal of Electronics &Information Technology, 2018, 40(6): 1453–1460. doi: 10.11999/JEIT170938 [5] LIU Xin, XIE Lun, and WANG Zhiliang. Empathizing with emotional robot based on cognition reappraisal[J]. China Communications, 2017, 14(9): 100–113. doi: 10.1109/CC.2017.8068769 [6] ZHANG Rui, WANG Zhenyu, and MAI Dongcheng. Building emotional conversation systems using multi-task Seq2Seq learning[C]. The 6th CCF International Conference on Natural Language Processing and Chinese Computing, Dalian, China, 2018: 612–621. doi: 10.1007/978-3-319-73618-1_51. [7] 杨杨, 邱雪松, 孟洛明, 等. 情感驱动的自私MANETs节点协商机制[J]. 电子与信息学报, 2011, 33(6): 1294–1300. doi: 10.3724/SP.J.1146.2010.01072YANG Yang, QIU Xuesong, MENG Luoming, et al. An emotion-driven negotiation mechanism of selfish nodes in the MANETs[J]. Journal of Electronics &Information Technology, 2011, 33(6): 1294–1300. doi: 10.3724/SP.J.1146.2010.01072 [8] ZHOU Hao, HUANG Minlie, ZHANG Tianyang, et al. Emotional chatting machine: Emotional conversation generation with internal and external memory[C]. The 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th Innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, USA, 2018: 730–738. [9] WANG Ke and WAN Xiaojun. SentiGAN: Generating sentimental texts via mixture adversarial networks[C]. The 27th International Joint Conference on Artificial Intelligence, Stockholm, Sverige, 2018: 4446–4452. doi: 10.24963/ijcai.2018/618. [10] LI Jingyuan and SUN Xiao. A syntactically constrained bidirectional-asynchronous approach for emotional conversation generation[C]. 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2018: 678–683. doi: 10.18653/v1/D18-1071. [11] 孙晓, 李佳, 卫星. 基于强化学习的情感编辑约束对话内容生成[J/OL]. 自动化学报. doi: 10.16383/j.aas.c190058.SUN Xiao, LI Jia, and WEI Xing. Emotional editing constraint conversation generation based on reinforcement learning[J/OL]. Acta Automatica Sinica. doi: 10.16383/j.aas.c190058. [12] RUSSELL J A and MEHRABIAN A. Evidence for a three-factor theory of emotions[J]. Journal of Research in Personality, 1977, 11(3): 273–294. doi: 10.1016/0092-6566(77)90037-X [13] PARK J W, KIM W H, LEE W H, et al. How to completely use the PAD space for socially interactive robots[C]. 2011 IEEE International Conference on Robotics and Biomimetics, Karon Beach, Thailand, 2011: 3005–3010. doi: 10.1109/ROBIO.2011.6181762. [14] 邵俊健. 高维数据的聚类算法及其距离度量的研究[D]. [硕士论文], 江南大学, 2019.SHAO Junjian. Research on clustering algorithm of high dimensional data and its Disatance metric[D]. [Master dissertation], Jiangnan University, 2019. [15] 孙佩宏, 陶霖密. PAD情感空间中情感距离度量方法[C]. 第四届和谐人机环境联合学术会议论文集, 武汉, 2008: 642–649.SUN Peihong and TAO Linmi. Emotional distance measurement method in PAD emotion space[C]. The 4th Harmonious Human-Computer Environment Joint Academic Conference, Wuhan, China, 2008: 642–649. [16] 吴伟国, 李虹漫. PAD情感空间内人工情感建模及人机交互实验[J]. 哈尔滨工业大学学报, 2019, 51(1): 29–37. doi: 10.11918/j.issn.0367-6234.201807138WU Weiguo and LI Hongman. Artificial emotion modeling in PAD emotional space and human-robot interactive experiment[J]. Journal of Harbin Institute of Technology, 2019, 51(1): 29–37. doi: 10.11918/j.issn.0367-6234.201807138 [17] HUPONT I, BALDASSARRI S, and CEREZO E. Facial emotional classification: From a discrete perspective to a continuous emotional space[J]. Pattern Analysis and Applications, 2013, 16(1): 41–54. doi: 10.1007/s10044-012-0286-6 [18] GUNTHER C. ChatterBot tutorial[EB/OL]. https://chatterbot.readthedocs.io/en/stable/tutorial.html, 2019. [19] WU Yu, WU Wei, XING Chen, et al. Sequential matching network: A new architecture for multi-turn response selection in retrieval-based chatbots[C]. The 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, 2017: 496–505.