基于跨语种声学分析的帕金森病检测方法

季薇; 王传瑜; 吴迪; 李云; 郑慧芬

doi:10.11999/JEIT230981

基于跨语种声学分析的帕金森病检测方法

doi: 10.11999/JEIT230981 cstr: 32379.14.JEIT230981

季薇¹,
王传瑜¹,
吴迪¹,
李云^2, ,,
郑慧芬³

1.
南京邮电大学通信与信息工程学院南京 210003
2.
南京邮电大学计算机学院南京 210023
3.
南京医科大学附属老年医院南京 210024

基金项目: 江苏省高校基础科学(自然科学)重大项目(21KJA520003)

详细信息

作者简介:
季薇：女，博士，教授，硕士生导师，研究方向为机器学习与信号处理的交叉研究、无线通信与通信信号处理等

王传瑜：男，硕士生，研究方向为机器学习与信号处理的交叉研究

吴迪：男，硕士生，研究方向为机器学习与信号处理的交叉研究

李云：男，博士，教授，博士生导师，研究方向为机器学习、特征选择、信息安全等

郑慧芬：女，博士，主任医师，研究方向帕金森病及相关运动障碍性疾病

通讯作者:
李云　liyun@njupt.edu.cn

中图分类号: TN911.7; TP391.4
计量
- 文章访问数: 605
- HTML全文浏览量: 337
- PDF下载量: 66
- 被引次数: 0
出版历程
- 收稿日期: 2023-09-07
- 修回日期: 2023-12-04
- 网络出版日期: 2023-12-13
- 刊出日期: 2024-02-29

Parkinson's Disease Detection Method Based on Cross-Language Acoustic Analysis

JI Wei¹,
WANG Chuanyu¹,
WU Di¹,
LI Yun^{2
, ,},
ZHENG Huifen³

1.
School of Communication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China
2.
School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing 210023, China
3.
Affiliated Geriatric Hospital of Nanjing Medical University, Nanjing 210024, China

Funds: The Basic Scientific (Natural Science) Major Program of the Higher Education Institutions of Jiangsu Province, China (21KJA520003)

摘要

摘要: 基于语音的帕金森病检测具有非介入式、成本较低和无创等优点。当前公开的帕金森病语音数据集大多来源于单一语种，存在数据容量不够大、受试者母语发音特点差异小等特点。单一语种数据集上训练的帕金森病检测模型在面对跨语种语音数据时，将出现性能下降。为避免语种差异带来的影响，提升模型在跨语种场景下的检测性能，该文引入对抗迁移学习和特征解耦的思想，提出一种帕金森病跨语种声学分析模型(CLSAM)。首先，将基于多头自注意力机制的Transformer编码块和多层神经网络级联，组成特征提取器模块，用于将从源域和目标域语音中提取的原始Fbank语音特征初步解耦为两个向量，即域不变病理信息表征向量和域信息表征向量；设计了目标任务不一致的双重对抗训练模块，显式地分离域不变病理信息和域信息；最终，提取跨语种语音数据中的域不变病理信息用于帕金森病检测。该文在公开的MaxLittle帕金森病语音数据集以及自采的帕金森病语音数据集上，采用十折交叉验证的方法验证了所提方法的有效性。实验结果表明：与传统机器学习方法以及现有的迁移学习算法相比，所提模型在跨语种场景中的检测准确率、敏感度和F1分数等性能均有明显提升。
- 跨语种声学分析 /
- 帕金森病 /
- 对抗迁移学习 /
- 特征解耦
Abstract: The research on speech-based Parkinson’s disease detection has the advantages of non-intrusive, low cost and non-invasive. The current publicly available speech datasets for Parkinson’s disease mostly originate from single-language speech, which has the characteristics such as insufficient data capacity and small differences in the pronunciation characteristics of the subjects' mother tongue. The Parkinson’s disease detection model trained on a single language dataset will experience performance degradation when faced with cross-language speech data. To avoid the impact of language differences and improve the detection performance of the model in cross-language scenarios, the ideas of adversarial transfer learning and feature decoupling is introduced and a Parkinson’s disease Cross-Language Speech Analysis Model (CLSAM) is proposed in this paper. Firstly, the model cascades a multihead self-attention encoder and a multi-layer neural network to form a feature extractor module, which is used to decouple the original Fbank speech features extracted from the pronunciation characteristics of the source domain and target domain into two vectors, namely domain invariant pathological information representation vector and domain information representation vector. Secondly, a dual adversarial training module with inconsistent target tasks is designed, which explicitly separates domain invariant pathological information and domain information. Finally, domain invariant pathological information is extracted from cross-language speech data for Parkinson’s disease detection. This paper verifies the effectiveness of the proposed method using a ten-fold cross-validation method on both the publicly available MaxLittle Parkinson’s disease speech dataset and the self-collected Parkinson’s disease speech dataset. Experimental results show that compared with traditional machine learning methods and existing transfer learning algorithms, the proposed model significantly improves the accuracy, sensitivity and F1 scores in cross-language scenarios.
- Cross-language speech analysis /
- Parkinson’s disease /
- Adversarial transfer learning /
- Feature decoupling

HTML全文

图 1 跨语种声学分析模型总体框架图

下载: 全尺寸图片幻灯片

图 2 基于多头自注意力机制的Transformer编码块

下载: 全尺寸图片幻灯片

算法1　基于对抗迁移学习的跨语种帕金森病检测算法
输入：源域数据集$ {{D}}_{\mathrm{s}} $和目标域数据集$ {{D}}_{\mathrm{t}} $
输出：可学习参数$ {\tilde {\boldsymbol{\theta}} _{{\text{Te}}}},{\tilde {\boldsymbol{\theta}} _{{\text{e1}}}},{\tilde {\boldsymbol{\theta}} _{{\text{d1}}}},{\tilde {\boldsymbol{\theta}} _{{\text{e2}}}},{\tilde {\boldsymbol{\theta}} _{{\text{d2}}}} $
Repeat
//特征学习阶段
For 从源域数据中选取一个批次的样本：
计算损失$ {L_{\mathrm{s}}}({{\boldsymbol{\theta}} _{{\mathrm{Te}}}},{{\boldsymbol{\theta}} _{{\mathrm{e}}1}},{{\boldsymbol{\theta}} _{{\mathrm{d}}1}}) $；
计算损失$ {L_{\mathrm{d}}}({{\boldsymbol{\theta}} _{{\mathrm{Te}}}},{{\boldsymbol{\theta}} _{{\mathrm{e}}2}},{{\boldsymbol{\theta}} _{{\mathrm{d}}2}}) $；
计算损失$ {L_{{\text{diff}}}}({\boldsymbol{V}}_{\mathrm{s}}^{\mathrm{e}};{\boldsymbol{V}}_{\mathrm{s}}^{\mathrm{d}}) $；
根据式(10)计算梯度，并更新$ {\tilde {\boldsymbol{\theta}} _{{\text{Te}}}},{\tilde {\boldsymbol{\theta}} _{{\text{e1}}}},{\tilde {\boldsymbol{\theta}} _{{\text{d1}}}},{\tilde {\boldsymbol{\theta}} _{{\text{e2}}}},{\tilde {\boldsymbol{\theta}} _{{\text{d2}}}} $
End
For 从目标域域数据中选取一个批次的样本：
计算损失$ {L_{\mathrm{s}}}({{\boldsymbol{\theta}} _{{\mathrm{Te}}}},{{\boldsymbol{\theta}} _{{\mathrm{e}}1}},{{\boldsymbol{\theta}} _{{\mathrm{d}}1}}) $；
计算损失$ {L_{\mathrm{d}}}({{\boldsymbol{\theta}} _{{\mathrm{Te}}}},{{\boldsymbol{\theta}} _{{\mathrm{e}}2}},{{\boldsymbol{\theta}} _{{\mathrm{d}}2}}) $；
计算损失$ {{L}}_{\mathrm{d}\mathrm{i}\mathrm{f}\mathrm{f}}({\boldsymbol{V}}_{\mathrm{t}}^{\mathrm{e}};{\boldsymbol{V}}_{\mathrm{t}}^{\mathrm{d}}) $；
根据式(10)计算梯度，并更新$ {\tilde {\boldsymbol{\theta}} _{{\text{Te}}}},{\tilde {\boldsymbol{\theta}} _{{\text{e1}}}},{\tilde {\boldsymbol{\theta}} _{{\text{d1}}}},{\tilde {\boldsymbol{\theta}} _{{\text{e2}}}},{\tilde {\boldsymbol{\theta}} _{{\text{d2}}}} $
End
//对抗迁移阶段
For 对源域或目标域的每一个样本，固定参数$ {{\boldsymbol{\theta}} _{{\mathrm{Te}}}} $、参数　　$ {{\boldsymbol{\theta}} _{{\mathrm{e}}1}} $、参数$ {{\boldsymbol{\theta}} _{{\mathrm{d}}2}} $
计算损失$ {L_{\rm{s}}}({{\boldsymbol{\theta}} _{{\mathrm{Te}}}},{{\boldsymbol{\theta}} _{{\mathrm{e}}1}},{{\boldsymbol{\theta}} _{{\mathrm{d}}1}}) $；
计算损失$ {L_{\mathrm{d}}}({{\boldsymbol{\theta }}_{{\mathrm{Te}}}},{{\boldsymbol{\theta}} _{{\mathrm{e}}2}},{{\boldsymbol{\theta}} _{{\mathrm{d}}2}}) $；
计算损失$ {L_{{\text{diff}}}}({\boldsymbol{V}}_{\text{s}}^{\mathrm{e}};{\boldsymbol{V}}_{\text{s}}^{\mathrm{d}}) $或$ {L_{{\text{diff}}}}({\boldsymbol{V}}_{\text{t}}^{\mathrm{e}};{\boldsymbol{V}}_{\text{t}}^{\mathrm{d}}) $；
根据式(11)计算梯度，并更新$ {\tilde {\boldsymbol{\theta}} _{{\text{d1}}}},{\tilde {\boldsymbol{\theta}} _{{\text{e2}}}} $
End
Until模型收敛

下载: 导出CSV

表 1 MaxLittle数据集的统计信息

	男性		女性		合计
受试者类别	PD	HC	PD	HC	PD	HC
受试者人数	22	4	11	6	33	10
平均年龄及统计方差	67.2 (9.3)	61(8.6)	67.2(9.3)	61(8.6)	67.2(9.3)	61(8.6)
年龄分布	48～85	46～72	48～85	46～72	48～85	46～72

下载: 导出CSV

表 2 自采帕金森病语音数据集的统计信息

	男性		女性		合计
受试者类别	PD	HC	PD	HC	PD	HC
受试者人数	49	8	19	9	68	17
平均年龄及统计方差	69.3(9.5)	66.5(7.2)	69.8(8.2)	65.3(6.8)	69.4(9.2)	65.9(7.0)
年龄分布	46～88	58～77	56～84	53～74	46～88	53～77
平均病情持续时间及统计方差	5.9 (3.6)	0	5.4 (3.1)	0	5.8 (3.4)	0
HY分期	1～4	0	1～4	0	1～4	0

下载: 导出CSV

表 3 CLSAM模型参数设置

网络结构参数	参数值
X_s	361×40
X_t	361×40
Transformer编码块Q, K, V向量维度	64
Transformer编码块多头注意力	2
Transformer编码块深度	6
多层前馈神经网络	[32,32]
domain_vec	16
p_vec	16
域鉴别器网络D₁	[32,16,2]
域鉴别器网络D₂	[16,2]
帕金森病检测模块E₁	[16,2]
帕金森病检测模块E₂	[32,16,2]
周期数	120
学习率	0.001
批大小	36
优化器	SGD
Dropout	0.1

下载: 导出CSV

表 4 与传统机器学习模型的性能比较(%)

模型	Acc.	Sen.	F1.
CLSAM	86.69	85.98	84.71
RF(s)	79.86	77.41	78.88
RF(t)	78.62	77.32	77.26
RF(s-t)	76.81	75.25	74.75
RF (t-s)	76.38	75.36	74.46
RF(st)	79.15	78.35	78.18
SVM (s)	79.52	77.53	78.35
SVM (t)	77.34	77.61	78.15
SVM(s-t)	75.72	74.46	74.26
SVM (t-s)	75.35	73.45	72.86
SVM (st)	78.95	76.86	75.68

下载: 导出CSV

表 5 与迁移学习模型的性能比较(%)

模型	Acc.	Sen.	F1.
CLSAM	86.69	85.98	84.71
DAN	80.83	81.86	81.56
DSAN	83.65	83.82	82.61
DANN	82.78	82.98	82.81
CADAN	84.10	83.22	83.56
TFFN	85.64	84.58	83.89
DSN	83.60	82.84	83.15

下载: 导出CSV

表 6 消融实验(%)

模型	Acc.	Sen.	F1.
CLSAM	86.69	85.98	84.71
CLSAM (不含双重对抗训练)	82.78	82.98	82.31
CLSAM (不含特征正交约束)	85.23	83.74	83.15
CLSAM(带有HSIC约束)	85.96	84.85	84.17

下载: 导出CSV

参考文献(25)

[1]	GULLAPALLI A S and MITTAL V K. Early detection of Parkinson’s disease through speech features and machine learning: a review[C]. ICT with Intelligent Applications: Proceedings of ICTIS, Singapore, 2022: 203–212. doi: 10.1007/978-981-16-4177-0_22.
[2]	BENBA A, JILBAB A, SANDABAD S, et al. Voice signal processing for detecting possible early signs of Parkinson’s disease in patients with rapid eye movement sleep behavior disorder[J]. International Journal of Speech Technology, 2019, 22(1): 121–129. doi: 10.1007/s10772-018-09588-0.
[3]	季薇, 杨茗淇, 李云, 等. 基于掩蔽自监督语音特征提取的帕金森病检测方法[J]. 电子与信息学报, 2023, 45(10): 3502–3510. doi: 10.11999/JEIT221041. JI Wei, YANG Mingqi, LI Yun, et al. Parkinson's disease detection method based on masked self-supervised speech feature extraction[J]. Journal of Electronics & Information Technology, 2023, 45(10): 3502–3510. doi: 10.11999/JEIT221041.
[4]	SUPHINNAPONG P, PHOKAEWVARANGKUL O, THUBTHONG N, et al. Objective vowel sound characteristics and their relationship with motor dysfunction in Asian Parkinson’s disease patients[J]. Journal of the Neurological Sciences, 2021, 426: 117487. doi: 10.1016/j.jns.2021.117487.
[5]	HSU S C, JIAO Yishan, MCAULIFFE M J, et al. Acoustic and perceptual speech characteristics of native Mandarin speakers with Parkinson's disease[J]. The Journal of the Acoustical Society of America, 2017, 141(3): EL293–EL299. doi: 10.1121/1.4978342.
[6]	KOVAC D, MEKYSKA J, GALAZ Z, et al. Multilingual analysis of speech and voice disorders in patients with Parkinson's Disease[C]. The 44th International Conference on Telecommunications and Signal Processing, Brno, Czech Republic, 2021: 273–277. doi: 10.1109/TSP52935.2021.9522597.
[7]	VÁSQUEZ-CORREA J C, ARIAS-VERGARA T, RIOS-URREGO C D, et al. Convolutional neural networks and a transfer learning strategy to classify Parkinson's Disease from speech in three different languages[C]. 24th Iberoamerican Congress on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, Havana, Cuba, 2019: 697–706. doi: 10.1007/978-3-030-33904-3_66.
[8]	KIM Y and CHOI Y. A cross-language study of acoustic predictors of speech intelligibility in individuals with Parkinson's Disease[J]. Journal of Speech, Language, and Hearing Research, 2017, 60(9): 2506–2518. doi: 10.1044/2017_JSLHR-S-16-0121.
[9]	NISHIO M and NIIMI S. Comparison of speaking rate, articulation rate and alternating motion rate in dysarthric speakers[J]. Folia Phoniatrica et Logopaedica, 2006, 58(2): 114–131. doi: 10.1159/000089612.
[10]	OROZCO-ARROYAVE J R, HöNIG F, ARIAS-LONDOñO J D, et al. Automatic detection of Parkinson's disease in running speech spoken in three different languages[J]. The Journal of the Acoustical Society of America, 2016, 139(1): 481–500. doi: 10.1121/1.4939739.
[11]	YEO E J, CHOI K, KIM S, et al. Cross-lingual dysarthria severity classification for English, Korean, and Tamil[C]. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Chiang Mai, Thailand, 2022: 566–574. doi: 10.23919/APSIPAASC55919.2022.9980124.
[12]	VÁSQUEZ-CORREA J C, RIOS-URREGO C D, ARIAS-VERGARA T, et al. Transfer learning helps to improve the accuracy to classify patients with different speech disorders in different languages[J]. Pattern Recognition Letters, 2021, 150: 272–279. doi: 10.1016/j.patrec.2021.04.011.
[13]	JIANG Junguang, SHU Yang, WANG Jianmin, et al. Transferability in deep learning: A survey[J]. arXiv: 2201.05867, 2022. doi: 10.48550/arXiv.2201.05867.
[14]	GHIFARY M, KLEIJN W B, and ZHANG Mengjie. Domain adaptive neural networks for object recognition[C]. 13th Pacific Rim International Conference on Artificial Intelligence, Gold Coast, QLD, Australia, 2014: 898–904. doi: 10.1007/978-3-319-13560-1_76.
[15]	ZHU Yongchun, ZHUANG Fuzhen, WANG Jindong, et al. Deep subdomain adaptation network for image classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(4): 1713–1722. doi: 10.1109/tnnls.2020.2988928.
[16]	GANIN Y, USTINOVA E, AJAKAN H, et al. Domain-adversarial training of neural networks[J]. The Journal of Machine Learning Research, 2016, 17(1): 2096–2030. doi: 10.1007/978-3-319-58347-1_10.
[17]	LONG Mingsheng, CAO Zhangjie, WANG Jianmin, et al. Conditional adversarial domain adaptation[C]. The 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 2018: 1647–1657. doi: 10.5555/3326943.3327094.
[18]	CAI Ruichu, LI Zijian, WEI Pengfei, et al. Learning disentangled semantic representation for domain adaptation[C]. International Joint Conferences on Artificial Intelligence (IJCAI), Macao, China, 2019: 2060–2066. doi: 10.24963/ijcai.2019/285.
[19]	TSANAS A, LITTLE M A, MCSHARRY P E, et al. Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease[J]. IEEE Transactions on Biomedical Engineering, 2012, 59(5): 1264–1271. doi: 10.1109/TBME.2012.2183367.
[20]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6000–6010. doi: 10.5555/3295222.3295349.
[21]	OROZCO-ARROYAVE J R, VÁSQUEZ-CORREA J C, VARGAS-BONILLA J F, et al. NeuroSpeech: An open-source software for Parkinson’s speech analysis[J]. Digital Signal Processing, 2018, 77: 207–221. doi: 10.1016/j.dsp.2017.07.004.
[22]	CAI D, HE X, HAN J, et al. Orthogonal Laplacianfaces for face recognition[J]. IEEE Transactions on Image Processing, 2006, 15(11): 3608–3614. doi: 10.1109/TIP.2006.881945.
[23]	BOUSMALIS K, TRIGEORGIS G, SILBERMAN N, et al. Domain separation networks[C]. The 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 343–351. doi: 10.5555/3157096.3157135.
[24]	LI Yiyang, WANG Shengsheng, WANG Bilin, et al. Transferable feature filtration network for multi-source domain adaptation[J]. Knowledge-Based Systems, 2023, 260: 110113. doi: 10.1016/J.KNOSYS.2022.110113.
[25]	SONG L, SMOLA A, GRETTON A, et al. Supervised feature selection via dependence estimation[C]. The 24th International Conference on Machine Learning, 2007: 823–830. doi: 10.1145/1273496.1273600.