基于一种改进的监督流形学习算法的语音情感识别

张石清; 李乐民; 赵知劲

doi:10.3724/SP.J.1146.2009.01430

基于一种改进的监督流形学习算法的语音情感识别

doi: 10.3724/SP.J.1146.2009.01430

张石清^{①③ 李乐民,},
李乐民,
赵知劲

基金项目:

国家自然科学基金(60872092)资助课题

计量
- 文章访问数: 4182
- HTML全文浏览量: 130
- PDF下载量: 851
- 被引次数: 10
出版历程
- 收稿日期: 2009-11-06
- 修回日期: 2010-04-13
- 刊出日期: 2010-11-19

Speech Emotion Recognition Based on an Improved Supervised Manifold Learning Algorithm

Zhang Shi-Qing^{①③ 李乐民
,},
Li Le-Min,
Zhao Zhi-Jin

摘要

摘要: 为了有效提高语音情感识别的性能，需要对嵌入在高维声学特征空间的非线性流形上的语音特征数据作非线性降维处理。监督局部线性嵌入(SLLE)是一种典型的用于非线性降维的监督流形学习算法。该文针对SLLE存在的缺陷，提出一种能够增强低维嵌入数据的判别力，具备最优泛化能力的改进SLLE算法。利用该算法对包含韵律和音质特征的48维语音情感特征数据进行非线性降维，提取低维嵌入判别特征用于生气、高兴、悲伤和中性4类情感的识别。在自然情感语音数据库的实验结果表明，该算法仅利用较少的9维嵌入特征就取得了90.78%的最高正确识别率，比SLLE提高了15.65%。可见，该算法用于语音情感特征数据的非线性降维，可以较好地改善语音情感识别结果。
- 语音情感识别 /
- 非线性降维 /
- 流形学习 /
- 监督局部线性嵌入
Abstract: To improve effectively the performance on speech emotion recognition, it is needed to perform nonlinear dimensionality reduction for speech feature data lying on a nonlinear manifold embedded in high-dimensional acoustic space. Supervised Locally Linear Embedding (SLLE) is a typical supervised manifold learning algorithm for nonlinear dimensionality reduction. Considering the existing drawbacks of SLLE, this paper proposes an improved version of SLLE, which enhances the discriminating power of low-dimensional embedded data and possesses the optimal generalization ability. The proposed algorithm is used to conduct nonlinear dimensionality reduction for 48-dimensional speech emotional feature data including prosody and voice quality features, and extract low-dimensional embedded discriminating features so as to recognize four emotions including anger, joy, sadness and neutral. Experimental results on the natural speech emotional database demonstrate that the proposed algorithm obtains the highest accuracy of 90.78% with only less 9 embedded features, making 15.65% improvement over SLLE. Therefore, the proposed algorithm can significantly improve speech emotion recognition results when applied for reducing dimensionality of speech emotional feature data.
- Speech emotion recognition /
- Nonlinear dimensionality reduction /
- Manifold learning /
- Supervised locally linear embedding

HTML全文

参考文献(1)

atural speech by combining prosody and voice quality features[C]. Advances in Neural Networks - ISNN 2008, Springer, 2008, Lecture Notes in Computer Science, 5264: 457-464.[18]Zhao Y, Zhao L, and Zou C, et al.. Speech emotion recognition using modified quadratic discrimination function[J].Journal of Electronics (China.2008, 25(6):840-8.

Picard R. Affective Computing[M]. MIT Press, Cambridge, MA, 1997: 1-24.[2]Jones C and Deeming A. Affective human-robotic interaction[C]. Affect and Emotion in Human-Computer Interaction, Springer, 2008, Lecture Notes in Computer Science, 4868: 175-185.[3]Morrison D, Wang R, and De Silva L C. Ensemble methods for spoken emotion recognition in call-centres[J].Speech Communication.2007, 49(2):98-112[4]Picard R. Robots with emotional intelligence[C]. 4th ACM/ IEEE international conference on Human robot interaction, California, 2009: 5-6.[5]Errity A and McKenna J. An investigation of manifold learning for speech analysis[C]. 9th International Conference on Spoken Language Processing (ICSLP'06), Pittsburgh, PA, USA, 2006: 2506-2509.[6]Goddard J, Schlotthauer G, and Torres M, et al.. Dimensionality reduction for visualization of normal and pathological speech data[J].Biomedical Signal Processing and Control.2009, 4(3):194-201[7]Yu D. The application of manifold based visual speech units for visual speech recognition[D]. [Ph.D.dissertation], Dublin City University, 2008.[8]Roweis S T and Saul L K. Nonlinear dimensionality reduction by locally linear embedding[J].Science.2000, 290(5500):2323-2326[9]Tenenbaum J B, Silva Vd, and Langford J C. A global geometric framework for nonlinear dimensionality reduction[J].Science.2000, 290(5500):2319-2323[10]Jolliffe I T. Principal Component Analysis[M]. New York: Springer, 2002: 150-165.[11]De Ridder D, Kouropteva O, and Okun O, et al.. Supervised locally linear embedding[C]. Artificial Neural Networks and Neural Information Processing-ICANN/ICONIP-2003, Springer, 2003, Lecture Notes in Computer Science, 2714, 333-341.[12]Liang D, Yang J, and Zheng Z, et al.. A facial expression recognition system based on supervised locally linear embedding[J].Pattern Recognition Letters.2005, 26(15):2374-2389[13]Pang Y, Teoh A, and Wong E, et al.. Supervised Locally Linear Embedding in face recognition[C]. International Symposium on Biometrics and Security Technologies, Islamabad, 2008: 1-6.[14]Platt J C. Fastmap, MetricMap, and Landmark MDS are all Nystrom algorithms[C]. 10th International Workshop on Artificial Intelligence and Statistics, Barbados, 2005: 261-268.[15]Aha D, Kibler D, and Albert M. Instance-based learning algorithms[J]. Machine Learning, 1991, 6(1): 37-66.[16]赵力, 将春辉, 邹采荣等. 语音信号中的情感特征分析和识别的研究[J].电子学报.2004, 32(4):606-609Zhao Li, Jiang Chun-hui, and Zou Cai-rong, et al.. A study on emotional feature analysis and recognition in speech[J]. Acta Electronica Sinica, 2004, 32(4): 606-609.[17]Zhang S. Emotion recognition.

施引文献

期刊类型引用(6)

1.	王昭, 娄昊, 范九伦. 一种基于模糊C均值的图像超像素分割方法. 微电子学与计算机. 2014(10): 18-21+27 . 百度学术
2.	吴涛. 图像阈值化的自适应粗糙熵方法. 中国图象图形学报. 2014(01): 1-10 . 百度学术
3.	罗亮, 冯象初, 张选德, 李小平. 基于非局部双边随机投影低秩逼近图像去噪算法. 电子与信息学报. 2013(01): 99-105 . 本站查看
4.	李伟斌, 高二, 宋松和. 一种全局最小化的图像分割方法. 电子与信息学报. 2013(04): 791-796 . 本站查看
5.	徐东, 彭真明. 基于C-V模型的改进快速水平集图像分割法. 强激光与粒子束. 2012(12): 2817-2821 . 百度学术
6.	董光辉, 席志红, 赵彦青. 基于文化算法的C-V水平集图像分割. 系统工程与电子技术. 2012(07): 1499-1504 . 百度学术