基于NIST评测的说话人分类及定位技术研究
doi: 10.3724/SP.J.1146.2010.00977
Speaker Diarization and Localization Technology Research Based on NIST Evaluation
-
摘要: 该文针对美国国家标准与技术研究院(NIST)的 NIST评测,构建了一套多距离麦克风说话人分类及定位语音处理系统,针对NIST富标注评测中提出的说话人分类问题,提出改进的结合时延估计和聚类的说话人分类方法,在保证稳定性的前提下降低说话人分类的复杂度并提高准确率;提出一种新的相邻阵元间时延构造矩阵方程算法,可得到多个说话人的方向角。实验在标准会议环境下采集真实语音数据进行算法验证,说话人分类算法的正确率接近目前主要说话人分类系统的正确率,定位方向角误差在3以内。实验结果说明,适当条件下多距离麦克风系统可作为合适的语音信号输入设备应用于多人多方会议环境。Abstract: This paper builds one speaker diarization and localization speech processing system based on Multiple Distance Microphone (MDM) for NIST evaluation, and proposes a modified clustering algorithm based on time delay estimation, which can decrease the complexity of speaker diarization and improve the correct rate under the guarantee of stable performance. A new time delay matrix structure is proposed, which can acquire multiple speakers direction angle. It is the real speech data collected under the standard session environment to validate the algorithms. The correct rate of proposed speaker diarization algorithm is similar with other speaker diarization system existed; Location algorithm direction angle error is less than 3. The results show that under appropriate conditions, the MDM system can be a better input device applied to multiple dialogue scenes.
-
Khne M, Togneri R, and Nordholm S. Robust source localization in reverberant environments based on weighted fuzzy clustering [J].IEEE Signal Processing Letters.2009, 16(2):85-88[6]Knapp C H and Carter G C. The generalized correlation method for estimation of time delay [J].IEEE Transactions on Acoustics, Speech and Signal Processing.1976, 24(4):320-327[8]杨芳, 湛燕, 田学东, 郭宝兰. 使用遗传算法实现K-means聚类算法的K值选择[J].微机发展.2003, 13(1):25-29 -
计量
- 文章访问数: 3345
- HTML全文浏览量: 100
- PDF下载量: 983
- 被引次数: 0