基于Fisher线性判别分析的语音信号端点检测方法

王明合; 张二华; 唐振民; 许昊

doi:10.11999/JEIT141122

基于Fisher线性判别分析的语音信号端点检测方法

doi: 10.11999/JEIT141122

计量
- 文章访问数: 1656
- HTML全文浏览量: 162
- PDF下载量: 860
- 被引次数: 0
出版历程
- 收稿日期: 2014-08-29
- 修回日期: 2014-12-19
- 刊出日期: 2015-06-19

Voice Activity Detection Based on Fisher Linear Discriminant Analysis

摘要

摘要: 传统的语音端点检测方法对辅音，特别是受到噪声污染的清音部分与背景噪声之间分离能力不足。针对上述问题，该文提出一种基于Fisher线性判别分析的梅尔频率倒谱系数(F-MFCC)端点检测方法。将清音信号和背景噪声视为两类分类问题，采用Fisher准则求解具有判别信息的最佳投影方向，使得投影后的特征参数具有最小类内散度和最大类间散度，从而增大清音与背景噪声的可分离性。在不同语音库上的实验结果表明，F-MFCC能够在不同信噪比和背景噪声条件下提高语音端点检测的准确率。
- 语音处理 /
- 语音端点检测 /
- 梅尔频率倒谱系数 /
- Fisher线性判别分析
Abstract: Traditional Voice Activity Detection (VAD) approaches can not effectively detect consonant as well as noisy unvoiced consonant. To address this problem, this paper proposes a VAD approach Mel Frequency Cepstrum Coefficient (F-MFCC) based on Fisher linear discriminant analysis, in consideration of two-class issue regarding to consonant and background noise. Fisher criterion rule is used to solve the optimal projection vector, building upon which we can minimize the within-class scatter can be minimized and the between-class scatter can be maximized, as a result to enhance separability between consonant and background noise. Extensive experiments are conducted to evaluate the F-MFCC performance. The results demonstrate that, under different SNR and noise conditions, the proposed approach achieves higher VAD accuracy.
- Speech processing /
- Voice Activity Detection (VAD) /
- Mel Frequency Cepstrum Coefficient (MFCC) /
- Fisher linear discriminant analysis

HTML全文

参考文献(20)

Junqua J C. Robustness and cooperative multi-model man-machine communication applications[C]. The Structure of Multimodal Dialogue, Maratea, Italy, 1991: 101-112.

ETSI. Universal Mobile Telecommunication Systems (UMTS); Mandatory Speech Codec speech processing functions, AMR speech codec; Voice Activity Detector VAD[S]. ETSI TS 126 094 v11.0.0(2012-10): 1-26.

Wan Yu-long, Wang Xian-liang, Zhou Ruo-hua, et al.. Enhanced voice activity detection based on automatic segmentation and event classification[J]. Journal of Computational Information Systems, 2014, 10(10): 4169-4177.

宫朝辉, 刁麓弘. 改进共振峰提取的语音端点检测[J]. 计算机辅助设计与图形学学报, 2013, 25(8): 1230-1236.

Gong Zhao-hui and Diao Lu-hong. Improved speech endpoint detection based on formant[J]. Journal of Computer Aided Design Computer Graphics, 2013, 25(8): 1230-1236.

李晔, 张仁志, 崔慧娟, 等. 低信噪比下基于谱熵的语音端点检测算法[J]. 清华大学学报(自然科学版), 2005, 45(10): 1397-1440.

Li Ye, Zhang Ren-zhi, Cui Hui-juan, et al.. Voice activity detection algorithm with low signal-to-noise ratios based on the spectrum entropy[J]. Journal of Tsinghua University (Science and Technology), 2005, 45(10): 1397-1440.

Chen Shi-huang and Wang Jhing-fa. A wavelet-based voice activity detection algorithm in noisy environments[C]. Proceedings of the 9th IEEE International Conference on Electmnics, Circuits and Systems, Dubrovnik, Croatia, 2002: 995-998.

Ghosh P K, Tsiartas A, and Narayanan S. Robust voice activity detection using long-term signal variability[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2011, 19(3): 600-613.

王宏志, 徐玉超, 李美静. 基于Mel频率倒谱参数相似度的语音端点检测算法[J]. 吉林大学学报(工学版), 2012, 42(5): 1331-1335.

Wang Hong-zhi, Xu Yu-chao, and Li Mei-jing. Voice activity detection algorithm based on Mel-frequency cepstrum coefficient (MFCC) similarity[J]. Journal of Jilin University (Engineering and Technology Edition), 2012, 42(5): 1331-1335.

Oh Sang-yeob and Chung Kyung-yong. Improvement of speech detection using ERB feature extraction[J]. Wireless Personal Communications, 2014, 79(4): 2439-2451.

卢志茂, 金辉, 张春祥, 等. 基于HHT和OSF的复杂环境语音端点检测[J]. 电子与信息学报, 2012, 34(1): 213-217.

Lu Zhi-mao, Jin Hui, Zhang Chun-xiang, et al.. Voice activity detection in complex environment based on Hilbert-Huang transform and order statistics filter[J]. Journal of Electronics Information Technology, 2012, 34(1): 213-217.

Deng Shi-wen and Han Ji-qing. Statistical voice activity detection based on sparse representation over learned dictionary[J]. Digital Signal Processing, 2013, 23(4): 1228-1232.

Zhang Yan, Tang Zhen-min, Li Yan-ping, et al.. A hierarchical framework approach for voice activity detection and speech enhancement[J]. The Scientific World Journal, 2014, Vol. 2014: Article ID 723643, 8 pages.

Choi Jae-hun and Chang Joon-hyuk. Dual-microphone voice activity detection technique based on two-step power level difference ratio[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2014, 22(6): 1069-1081.

Ryant N, Liberman M, and Yuan Jia-hong. Speech activity detection on YouTube using deep neural networks[C]. Interspeech: 14th Annual Conference of the International Speech Communication Association, Lyon, France, 2013: 728-731.

Fisher R A. The use of multiple measures in taxonomic problems[J]. Annals of Eugenics, 1936, 7(2): 179-188.

Mak M W and Yu H B. A study of voice activity detection techniques for NIST speaker recognition evaluations[J]. Computer Speech Language, 2014, 28(1): 295-313.

施引文献

资源附件(0)

访问统计