高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

残差网络在婴幼儿哭声识别中的应用

谢湘 张立强 王晶

谢湘, 张立强, 王晶. 残差网络在婴幼儿哭声识别中的应用[J]. 电子与信息学报, 2019, 41(1): 233-239. doi: 10.11999/JEIT180276
引用本文: 谢湘, 张立强, 王晶. 残差网络在婴幼儿哭声识别中的应用[J]. 电子与信息学报, 2019, 41(1): 233-239. doi: 10.11999/JEIT180276
Xiang XIE, Liqiang ZHANG, Jing WANG. Application of Residual Network to Infant Crying Recognition[J]. Journal of Electronics & Information Technology, 2019, 41(1): 233-239. doi: 10.11999/JEIT180276
Citation: Xiang XIE, Liqiang ZHANG, Jing WANG. Application of Residual Network to Infant Crying Recognition[J]. Journal of Electronics & Information Technology, 2019, 41(1): 233-239. doi: 10.11999/JEIT180276

残差网络在婴幼儿哭声识别中的应用

doi: 10.11999/JEIT180276
基金项目: 国家自然科学基金(61473041, 11590772, 61571044)
详细信息
    作者简介:

    谢湘:男,1976年生,副教授,研究方向为语音识别

    张立强:男,1995年生,硕士生,研究方向为语音人格感知

    王晶:女,1980年生,副教授,研究方向为音频信号处理

    通讯作者:

    谢湘 xiexiang@bit.edu.cn

  • 中图分类号: TP391.42

Application of Residual Network to Infant Crying Recognition

Funds: The National Natural Science Foundation of China (61473041, 11590772, 61571044)
  • 摘要:

    该文使用语谱图结合残差网络的深度学习模型进行婴幼儿哭声的识别,使用婴幼儿哭声与非哭声样本比例均衡的语料库,经过五折交叉验证,与支持向量机(SVM),卷积神经网络(CNN),基于Gammatone滤波器的听觉谱残差网络(GT-Resnet)3种模型相比,基于语谱图的残差网络取得了最优结果,F1-score达到0.9965,满足实时性要求,证明了语谱图在婴幼儿哭声识别任务中能直观地反映声学特征,基于语谱图的残差网络是解决婴幼儿哭声识别任务的优秀方法。

  • 图  1  婴幼儿哭声,成人说话声和铃声语谱图对比

    图  2  残差模块

    图  3  CNN-5模型结构

    图  4  3种模型测试集F1-score对比

    图  5  3种层数残差网络测试集F1-score对比

    图  6  残差网络模型

    表  1  五折交叉验证数据集平均规模(条)

    婴幼儿哭声非哭声总计
    训练集规模124311482391
    测试集规模310286596
    下载: 导出CSV

    表  2  SVM实验特征提取

    提取特征类型统计处理方法维数
    MFCC及其1阶2阶差分均值、方差72
    短时能量均值、方差2
    基音频率均值、方差、最大值、最小值、极差5
    下载: 导出CSV

    表  3  SVM不同核函数性能比较

    核函数类型F1-score参数
    线性核函数0.8717c=0.68
    多项式核函数0.9316c=0.30, g=0.35, r=–0.20, d=3.00
    高斯核函数0.9458c=0.98, g=1.71
    Sigmod核函数0.8874c=5.00, g=0.04, r=1.80
    下载: 导出CSV

    表  4  不同层数CNN性能对比

    CNN模型输入特征F1-score
    CNN-4-MEL40×128Mel语谱图0.9184
    CNN-4-227227×227语谱图0.9233
    CNN-4128×128语谱图0.9229
    CNN-5-227227×227语谱图0.9482
    CNN-5128×128语谱图0.9489
    CNN-6128×128语谱图0.9365
    CNN-7128×128语谱图0.9398
    下载: 导出CSV

    表  5  模型性能对比

    模型网络结构输入特征生成模型大小(MB)平均测试时间(s)F1-score
    SVM单层网络统计特征0.70.0910+0.00010.9458
    CNN-54conv+1fc语谱图100.1251+0.00930.9489
    Resnet153resblock+1fc语谱图480.1251+0.02810.9836
    Resnet194resblock+1fc语谱图870.1251+0.03150.9965
    Resnet276resblock+1fc语谱图1710.1251+0.03550.9965
    GT-Resnet153resblock+1fc听觉谱480.1933+0.02180.9803
    GT-Resnet194resblock+1fc听觉谱870.1933+0.02370.9782
    GT-Resnet276resblock+1fc听觉谱1710.1933+0.02850.9719
    注:平均测试时间=特征提取时间+模型预测时间
    下载: 导出CSV
  • 于洪志, 刘思思. 三个月婴儿啼哭声的声学分析[C]. 全国人机语音通讯学术会议, 西安, 2011: 1–4.

    YU Hongzhi and LIU Sisi. Crying sound learning analysis of three months baby[C]. National Conference on Man-Machine Speech Communication, Xi’an, China, 2011: 1–4.
    王之禹, 雷云珊. 婴儿啼哭声的声学特征[C]. 中国声学学会2006年全国声学学术会议, 厦门, 2006: 389–390.

    WANG Zhiyu and LEI Yunshan. Acoustic characteristic of infant cries[C]. National Conference on Acoustics. Acoustical Society of China, Xiamen, China, 2006: 389–390.
    ABDULAZIZ Y and AHMAD S M S. Infant cry recognition system: A comparison of system performance based on mel frequency and linear prediction cepstral coefficients[C]. International Conference on Information Retrieval & Knowledge Management, Shah Alam, Malaysia, 2010: 260–263. doi: 10.1109/INFRKM.2010.5466907.
    COHEN R and LAVNER Y. Infant cry analysis and detection[C]. Electrical & Electronics Engineers in Israel, Eilat, Israel, 2012: 1–5.
    LAVNER Y, COHEN R, RUINSKIY D, et al. Baby cry detection in domestic environment using deep learning[C]. 2016 IEEE International Conference on the Science of Electrical Engineering (ICSEE), Eilat, Israel, 2016: 1–5. doi: 10.1109/EEEI.2012.6376996.
    TORRES R, BATTAGLINO D, and LEPAULOUX L. Baby cry sound detection: A comparison of hand crafted features and deep learning approach[C]. International Conference on Engineering Applications of Neural Networks. Springer, Cham, 2017: 168–179. doi: 10.1007/978-3-319-65172-9_15.
    CHANG Chuanyu and LI Jiajing. Application of deep learning for recognizing infant cries[C]. IEEE International Conference on Consumer Electronics, Nantou, China, 2016: 1–2. doi: 10.1109/ICCE-TW.2016.7520947.
    SHARAN R V and MOIR T J. Cochleagram image feature for improved robustness in sound recognition[C]. IEEE International Conference on Digital Signal Processing, Singapore, 2015: 441–444.
    PATTERSON R D, NIMMO-SMITH I, HOLDSWORTH J, et al. An efficient auditory filterbank based on the gammatone function[C]. Proceedings of the 1987 Speech-Group Meeting of the Institute of Acoustics on Auditory Modelling, RSRE, Malvern, 1987: 2–18.
    刘文举, 聂帅, 梁山, 等. 基于深度学习语音分离技术的研究现状与进展[J]. 自动化学报, 2016, 42(6): 819–833. doi: 10.16383/j.aas.2016.c150734

    LIU Wenju, NIE Shuai, LIANG Shan, et al. Deep learning based speech separation technology and its developments[J]. Acta Automatica Sinica, 2016, 42(6): 819–833. doi: 10.16383/j.aas.2016.c150734
    MITTAL V K. Discriminating features of infant cry acoustic signal for automated detection of cause of crying[C]. International Symposium on Chinese Spoken Language Processing, Tianjin, China, 2017: 1–5. doi: 10.1109/ISCSLP.2016.7918391.
    RPSITA Y D and JUNAEDI H. Infant’s cry sound classification using Mel-Frequency Cepstrum Coefficients feature extraction and Backpropagation Neural Network[C]. International Conference on Science and Technology-Computer, Yogyakarta, Indonesia, 2017: 160–166. doi: 10.1109/ICSTC.2016.7877367.
    雷云珊. 婴儿啼哭声分析与模式分类[D]. [硕士论文], 山东科技大学, 2006.

    LEI Yunshan. Analysis and pattern classification of infants’ cry[D]. [Master dissertation], Shandong University of Science and Technology, 2006.
    KRIZHEVAKY A, SUTSKEVER I, and HINTON G E. ImageNet classification with deep convolutional neural networks[C]. International Conference on Neural Information Processing Systems, Nevada, USA, 2012: 1097–1105.
    HE Kaiming, ZHANG Xianyu, REN Shaoqing, et al. Deep residual learning for image recognition[C]. Computer Vision and Pattern Recognition, Nevada, USA, 2016: 770–778. doi: 10.1109/CVPR.2016.90.
    GVERES. donateacry-corpus[OL]. https://github.com/gveres/donateacry-corpus, 2017.3.
    彭天强, 栗芳. 基于深度卷积神经网络和二进制哈希学习的图像检索方法[J]. 电子与信息学报, 2016, 38(8): 2068–2075. doi: 10.11999/JEIT151346

    PENG Tianqiang and LI Fang. Image retrieval based on deep convolutional neural networks and binary hashing learning[J]. Journal of Electronics &Information Technology, 2016, 38(8): 2068–2075. doi: 10.11999/JEIT151346
    CHANG Chihchung and LIN Chihjen. LIBSVM: A library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 1–27. doi: 10.1145/1961189.1961199
    徐利强, 谢湘, 黄石磊, 等. 连续语音中的笑声检测研究与实现[C]. 全国声学学术会议, 武汉, 2016: 581–584.

    XU Liqiang, XIE Xiang, HUANG Shilei, et al. Research and implementation of laughter detection in continuous speech[C]. National Conference on Acoustics. Acoustical Society of China, Wuhan, China, 2016: 581–584.
  • 加载中
图(6) / 表(5)
计量
  • 文章访问数:  3048
  • HTML全文浏览量:  1057
  • PDF下载量:  106
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-03-23
  • 修回日期:  2018-09-04
  • 网络出版日期:  2018-09-11
  • 刊出日期:  2019-01-01

目录

    /

    返回文章
    返回