一种两步判决的说话人分割算法

杨继臣; 贺前华; 李艳雄; 王伟凝

doi:10.3724/SP.J.1146.2009.01072

一种两步判决的说话人分割算法

doi: 10.3724/SP.J.1146.2009.01072

基金项目:

国家自然科学基金(60972132，60602014)资助课题

计量
- 文章访问数: 3823
- HTML全文浏览量: 107
- PDF下载量: 696
- 被引次数: 0
出版历程
- 收稿日期: 2009-08-10
- 修回日期: 2009-12-01
- 刊出日期: 2010-08-19

A Two-step Criterion Algorithm of Speaker Segmentation

摘要

摘要: 为了提高说话人分割(SS)准确率，该文综合考虑了静音信息和性别信息在SS中的作用，提出了一种两步判决的SS算法。在从音频流中分离出语音段的基础上，采用两步判决的方法进行SS。第1步采用基频信息为主、性别模型为辅的策略进行SS，将相邻说话人基频差异大的说话人改变检测出来；第2步采用基于性别的改进T2判决公式进行SS，实现相邻说话人基频差异小的同性别SS，为此，该文提出了一个基于块的潜在说话人改变点检测算法。实验结果表明，本文算法提高了分割准确率，F1度量值可达85.14%。对于短时长(2 s)语音段的SS，该算法和传统的贝叶斯信息判决算法相比，漏检率减少了16%。
- 语音信号处理 /
- 两步判决 /
- 说话人分割 /
- 基频信息 /
- 性别信息
Abstract: To improve the precision of Speaker Segmentation (SS), this paper propose a two-step SS algorithm by making use of silence and gender information. Two-step criterion is used to decide the Speaker Change Point (SCP) within detected speech segmentations. In the first step, pitch difference between different speakers and gender model are used to locate the SCP within neighboring speech segments; In the second step, a gender-based modified T2 criterion formula is used to locate SCP among the same gender speakers, and potential speaker change point is detected based on chunk. The experiment results show that the proposed algorithm improved SS precision and F1 can reach 85.14%. For SS with duration less than 2 s, the algorithm can reduce missed detection rate of about 16%, compared with Bayesian information Criterion.
- Speech signal processing /
- Two-step criterion /
- Speaker Segmentation (SS) /
- Pitch information /
- Gender information

HTML全文

参考文献(1)

Sinha R, Tranter S E, Gales M J F, and Woodland P C. Thecambridge university March 2005 speaker diarisation system.In proceeding of the European Conference SpeechCommunication and Technology. Lisbon, Portugal, 2005:2437-2440.[2]Kotti M, Benetos E, and Kotropoulos C. Computationallyefficient and robust BIC-Based speaker segmentation [J].IEEE Transactions on Speech and Audio Processing.2008,16(5):920-933[3]Chen S and Gopalakrishnan P S. Speaker, environment andchannel change detection and clustering via the Bayesianinformation criterion. Proc. DARPA Broadcast NewsTranscription and Understanding Workshop, Lansdowne, VA,Feb. 1998: 127-132.[4]El-Khoury E, Senac C, and Pinquier J. Improved speakerdiarization system for meetings. In ICASSP2009, Taipei,April, 2009: 4097-4100.[5]Christoph Boehm and Franz pernkopf. Effective metric-basedspeaker segmentation in the frequency domain. InICASSP2009, Taipei, April 2009: 4081-4084.[6]Kwon S and Narayanan S. Unsupervised speaker indexingusing generic models [J].IEEE Transactions on Speech andAudio Processing.2005, 13(5):1004-1013[7]郑铁然, 李海峰等. 基于预分割的说话人分割方法. 通信学报,2009, 30(2): 118-123.Zheng Tie-ran and Li Hai-feng, et al.. Method of speakerssegmentation based on pre-segmentation. Journal ofCommuncation, 2009, 30(2): 118-123.[8]Zhou B and Hansen H L. Efficient audio stream segmentationvia the combined T2-statistics and Bayesian informationcriterion [J].IEEE Transactions on Speech and AudioProcessing.2005, 13(4):467-474[9]Lu Lie, Zhang Hong-jiang, and Jiang Hao. Content analysisfor audio classification and segmentation [J].IEEETransactions on Speech and Audio Processing.2002, 10(7):504-516[10]Kotti M, Moschou V, and Kotropoulos C. Speakersegmentation and clustering [J].Journal of Signal Processing.2008, 88(5):1091-1124[11]Boersma P and Weenink D. Paraat: Doing phonetics bycomputer. Available: http:/ /www. praat.org/

施引文献

资源附件(0)

访问统计