Parkinson's Disease Detection Method Based on Cross-Language Acoustic Analysis
-
摘要: 基于语音的帕金森病检测具有非介入式、成本较低和无创等优点。当前公开的帕金森病语音数据集大多来源于单一语种,存在数据容量不够大、受试者母语发音特点差异小等特点。单一语种数据集上训练的帕金森病检测模型在面对跨语种语音数据时,将出现性能下降。为避免语种差异带来的影响,提升模型在跨语种场景下的检测性能,该文引入对抗迁移学习和特征解耦的思想,提出一种帕金森病跨语种声学分析模型(CLSAM)。首先,将基于多头自注意力机制的Transformer编码块和多层神经网络级联,组成特征提取器模块,用于将从源域和目标域语音中提取的原始Fbank语音特征初步解耦为两个向量,即域不变病理信息表征向量和域信息表征向量;设计了目标任务不一致的双重对抗训练模块,显式地分离域不变病理信息和域信息;最终,提取跨语种语音数据中的域不变病理信息用于帕金森病检测。该文在公开的MaxLittle帕金森病语音数据集以及自采的帕金森病语音数据集上,采用十折交叉验证的方法验证了所提方法的有效性。实验结果表明:与传统机器学习方法以及现有的迁移学习算法相比,所提模型在跨语种场景中的检测准确率、敏感度和F1分数等性能均有明显提升。Abstract: The research on speech-based Parkinson’s disease detection has the advantages of non-intrusive, low cost and non-invasive. The current publicly available speech datasets for Parkinson’s disease mostly originate from single-language speech, which has the characteristics such as insufficient data capacity and small differences in the pronunciation characteristics of the subjects' mother tongue. The Parkinson’s disease detection model trained on a single language dataset will experience performance degradation when faced with cross-language speech data. To avoid the impact of language differences and improve the detection performance of the model in cross-language scenarios, the ideas of adversarial transfer learning and feature decoupling is introduced and a Parkinson’s disease Cross-Language Speech Analysis Model (CLSAM) is proposed in this paper. Firstly, the model cascades a multihead self-attention encoder and a multi-layer neural network to form a feature extractor module, which is used to decouple the original Fbank speech features extracted from the pronunciation characteristics of the source domain and target domain into two vectors, namely domain invariant pathological information representation vector and domain information representation vector. Secondly, a dual adversarial training module with inconsistent target tasks is designed, which explicitly separates domain invariant pathological information and domain information. Finally, domain invariant pathological information is extracted from cross-language speech data for Parkinson’s disease detection. This paper verifies the effectiveness of the proposed method using a ten-fold cross-validation method on both the publicly available MaxLittle Parkinson’s disease speech dataset and the self-collected Parkinson’s disease speech dataset. Experimental results show that compared with traditional machine learning methods and existing transfer learning algorithms, the proposed model significantly improves the accuracy, sensitivity and F1 scores in cross-language scenarios.
-
算法1 基于对抗迁移学习的跨语种帕金森病检测算法 输入:源域数据集$ {{D}}_{\mathrm{s}} $和目标域数据集$ {{D}}_{\mathrm{t}} $ 输出:可学习参数$ {\tilde {\boldsymbol{\theta}} _{{\text{Te}}}},{\tilde {\boldsymbol{\theta}} _{{\text{e1}}}},{\tilde {\boldsymbol{\theta}} _{{\text{d1}}}},{\tilde {\boldsymbol{\theta}} _{{\text{e2}}}},{\tilde {\boldsymbol{\theta}} _{{\text{d2}}}} $ Repeat //特征学习阶段 For 从源域数据中选取一个批次的样本: 计算损失$ {L_{\mathrm{s}}}({{\boldsymbol{\theta}} _{{\mathrm{Te}}}},{{\boldsymbol{\theta}} _{{\mathrm{e}}1}},{{\boldsymbol{\theta}} _{{\mathrm{d}}1}}) $; 计算损失$ {L_{\mathrm{d}}}({{\boldsymbol{\theta}} _{{\mathrm{Te}}}},{{\boldsymbol{\theta}} _{{\mathrm{e}}2}},{{\boldsymbol{\theta}} _{{\mathrm{d}}2}}) $; 计算损失$ {L_{{\text{diff}}}}({\boldsymbol{V}}_{\mathrm{s}}^{\mathrm{e}};{\boldsymbol{V}}_{\mathrm{s}}^{\mathrm{d}}) $; 根据式(10)计算梯度,并更新$ {\tilde {\boldsymbol{\theta}} _{{\text{Te}}}},{\tilde {\boldsymbol{\theta}} _{{\text{e1}}}},{\tilde {\boldsymbol{\theta}} _{{\text{d1}}}},{\tilde {\boldsymbol{\theta}} _{{\text{e2}}}},{\tilde {\boldsymbol{\theta}} _{{\text{d2}}}} $ End For 从目标域域数据中选取一个批次的样本: 计算损失$ {L_{\mathrm{s}}}({{\boldsymbol{\theta}} _{{\mathrm{Te}}}},{{\boldsymbol{\theta}} _{{\mathrm{e}}1}},{{\boldsymbol{\theta}} _{{\mathrm{d}}1}}) $; 计算损失$ {L_{\mathrm{d}}}({{\boldsymbol{\theta}} _{{\mathrm{Te}}}},{{\boldsymbol{\theta}} _{{\mathrm{e}}2}},{{\boldsymbol{\theta}} _{{\mathrm{d}}2}}) $; 计算损失$ {{L}}_{\mathrm{d}\mathrm{i}\mathrm{f}\mathrm{f}}({\boldsymbol{V}}_{\mathrm{t}}^{\mathrm{e}};{\boldsymbol{V}}_{\mathrm{t}}^{\mathrm{d}}) $; 根据式(10)计算梯度,并更新$ {\tilde {\boldsymbol{\theta}} _{{\text{Te}}}},{\tilde {\boldsymbol{\theta}} _{{\text{e1}}}},{\tilde {\boldsymbol{\theta}} _{{\text{d1}}}},{\tilde {\boldsymbol{\theta}} _{{\text{e2}}}},{\tilde {\boldsymbol{\theta}} _{{\text{d2}}}} $ End //对抗迁移阶段 For 对源域或目标域的每一个样本,固定参数$ {{\boldsymbol{\theta}} _{{\mathrm{Te}}}} $、参数
$ {{\boldsymbol{\theta}} _{{\mathrm{e}}1}} $、参数$ {{\boldsymbol{\theta}} _{{\mathrm{d}}2}} $计算损失$ {L_{\rm{s}}}({{\boldsymbol{\theta}} _{{\mathrm{Te}}}},{{\boldsymbol{\theta}} _{{\mathrm{e}}1}},{{\boldsymbol{\theta}} _{{\mathrm{d}}1}}) $; 计算损失$ {L_{\mathrm{d}}}({{\boldsymbol{\theta }}_{{\mathrm{Te}}}},{{\boldsymbol{\theta}} _{{\mathrm{e}}2}},{{\boldsymbol{\theta}} _{{\mathrm{d}}2}}) $; 计算损失$ {L_{{\text{diff}}}}({\boldsymbol{V}}_{\text{s}}^{\mathrm{e}};{\boldsymbol{V}}_{\text{s}}^{\mathrm{d}}) $或$ {L_{{\text{diff}}}}({\boldsymbol{V}}_{\text{t}}^{\mathrm{e}};{\boldsymbol{V}}_{\text{t}}^{\mathrm{d}}) $; 根据式(11)计算梯度,并更新$ {\tilde {\boldsymbol{\theta}} _{{\text{d1}}}},{\tilde {\boldsymbol{\theta}} _{{\text{e2}}}} $ End Until模型收敛 表 1 MaxLittle数据集的统计信息
男性 女性 合计 受试者类别 PD HC PD HC PD HC 受试者人数 22 4 11 6 33 10 平均年龄及统计方差 67.2 (9.3) 61(8.6) 67.2(9.3) 61(8.6) 67.2(9.3) 61(8.6) 年龄分布 48~85 46~72 48~85 46~72 48~85 46~72 表 2 自采帕金森病语音数据集的统计信息
男性 女性 合计 受试者类别 PD HC PD HC PD HC 受试者人数 49 8 19 9 68 17 平均年龄及统计方差 69.3(9.5) 66.5(7.2) 69.8(8.2) 65.3(6.8) 69.4(9.2) 65.9(7.0) 年龄分布 46~88 58~77 56~84 53~74 46~88 53~77 平均病情持续时间及统计方差 5.9 (3.6) 0 5.4 (3.1) 0 5.8 (3.4) 0 HY分期 1~4 0 1~4 0 1~4 0 表 3 CLSAM模型参数设置
网络结构参数 参数值 X_s 361×40 X_t 361×40 Transformer编码块Q, K, V向量维度 64 Transformer编码块多头注意力 2 Transformer编码块深度 6 多层前馈神经网络 [32,32] domain_vec 16 p_vec 16 域鉴别器网络D1 [32,16,2] 域鉴别器网络D2 [16,2] 帕金森病检测模块E1 [16,2] 帕金森病检测模块E2 [32,16,2] 周期数 120 学习率 0.001 批大小 36 优化器 SGD Dropout 0.1 表 4 与传统机器学习模型的性能比较(%)
模型 Acc. Sen. F1. CLSAM 86.69 85.98 84.71 RF(s) 79.86 77.41 78.88 RF(t) 78.62 77.32 77.26 RF(s-t) 76.81 75.25 74.75 RF (t-s) 76.38 75.36 74.46 RF(st) 79.15 78.35 78.18 SVM (s) 79.52 77.53 78.35 SVM (t) 77.34 77.61 78.15 SVM(s-t) 75.72 74.46 74.26 SVM (t-s) 75.35 73.45 72.86 SVM (st) 78.95 76.86 75.68 表 5 与迁移学习模型的性能比较(%)
模型 Acc. Sen. F1. CLSAM 86.69 85.98 84.71 DAN 80.83 81.86 81.56 DSAN 83.65 83.82 82.61 DANN 82.78 82.98 82.81 CADAN 84.10 83.22 83.56 TFFN 85.64 84.58 83.89 DSN 83.60 82.84 83.15 表 6 消融实验(%)
模型 Acc. Sen. F1. CLSAM 86.69 85.98 84.71 CLSAM (不含双重对抗训练) 82.78 82.98 82.31 CLSAM (不含特征正交约束) 85.23 83.74 83.15 CLSAM(带有HSIC约束) 85.96 84.85 84.17 -
[1] GULLAPALLI A S and MITTAL V K. Early detection of Parkinson’s disease through speech features and machine learning: a review[C]. ICT with Intelligent Applications: Proceedings of ICTIS, Singapore, 2022: 203–212. doi: 10.1007/978-981-16-4177-0_22. [2] BENBA A, JILBAB A, SANDABAD S, et al. Voice signal processing for detecting possible early signs of Parkinson’s disease in patients with rapid eye movement sleep behavior disorder[J]. International Journal of Speech Technology, 2019, 22(1): 121–129. doi: 10.1007/s10772-018-09588-0. [3] 季薇, 杨茗淇, 李云, 等. 基于掩蔽自监督语音特征提取的帕金森病检测方法[J]. 电子与信息学报, 2023, 45(10): 3502–3510. doi: 10.11999/JEIT221041.JI Wei, YANG Mingqi, LI Yun, et al. Parkinson's disease detection method based on masked self-supervised speech feature extraction[J]. Journal of Electronics & Information Technology, 2023, 45(10): 3502–3510. doi: 10.11999/JEIT221041. [4] SUPHINNAPONG P, PHOKAEWVARANGKUL O, THUBTHONG N, et al. Objective vowel sound characteristics and their relationship with motor dysfunction in Asian Parkinson’s disease patients[J]. Journal of the Neurological Sciences, 2021, 426: 117487. doi: 10.1016/j.jns.2021.117487. [5] HSU S C, JIAO Yishan, MCAULIFFE M J, et al. Acoustic and perceptual speech characteristics of native Mandarin speakers with Parkinson's disease[J]. The Journal of the Acoustical Society of America, 2017, 141(3): EL293–EL299. doi: 10.1121/1.4978342. [6] KOVAC D, MEKYSKA J, GALAZ Z, et al. Multilingual analysis of speech and voice disorders in patients with Parkinson's Disease[C]. The 44th International Conference on Telecommunications and Signal Processing, Brno, Czech Republic, 2021: 273–277. doi: 10.1109/TSP52935.2021.9522597. [7] VÁSQUEZ-CORREA J C, ARIAS-VERGARA T, RIOS-URREGO C D, et al. Convolutional neural networks and a transfer learning strategy to classify Parkinson's Disease from speech in three different languages[C]. 24th Iberoamerican Congress on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, Havana, Cuba, 2019: 697–706. doi: 10.1007/978-3-030-33904-3_66. [8] KIM Y and CHOI Y. A cross-language study of acoustic predictors of speech intelligibility in individuals with Parkinson's Disease[J]. Journal of Speech, Language, and Hearing Research, 2017, 60(9): 2506–2518. doi: 10.1044/2017_JSLHR-S-16-0121. [9] NISHIO M and NIIMI S. Comparison of speaking rate, articulation rate and alternating motion rate in dysarthric speakers[J]. Folia Phoniatrica et Logopaedica, 2006, 58(2): 114–131. doi: 10.1159/000089612. [10] OROZCO-ARROYAVE J R, HöNIG F, ARIAS-LONDOñO J D, et al. Automatic detection of Parkinson's disease in running speech spoken in three different languages[J]. The Journal of the Acoustical Society of America, 2016, 139(1): 481–500. doi: 10.1121/1.4939739. [11] YEO E J, CHOI K, KIM S, et al. Cross-lingual dysarthria severity classification for English, Korean, and Tamil[C]. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Chiang Mai, Thailand, 2022: 566–574. doi: 10.23919/APSIPAASC55919.2022.9980124. [12] VÁSQUEZ-CORREA J C, RIOS-URREGO C D, ARIAS-VERGARA T, et al. Transfer learning helps to improve the accuracy to classify patients with different speech disorders in different languages[J]. Pattern Recognition Letters, 2021, 150: 272–279. doi: 10.1016/j.patrec.2021.04.011. [13] JIANG Junguang, SHU Yang, WANG Jianmin, et al. Transferability in deep learning: A survey[J]. arXiv: 2201.05867, 2022. doi: 10.48550/arXiv.2201.05867. [14] GHIFARY M, KLEIJN W B, and ZHANG Mengjie. Domain adaptive neural networks for object recognition[C]. 13th Pacific Rim International Conference on Artificial Intelligence, Gold Coast, QLD, Australia, 2014: 898–904. doi: 10.1007/978-3-319-13560-1_76. [15] ZHU Yongchun, ZHUANG Fuzhen, WANG Jindong, et al. Deep subdomain adaptation network for image classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(4): 1713–1722. doi: 10.1109/tnnls.2020.2988928. [16] GANIN Y, USTINOVA E, AJAKAN H, et al. Domain-adversarial training of neural networks[J]. The Journal of Machine Learning Research, 2016, 17(1): 2096–2030. doi: 10.1007/978-3-319-58347-1_10. [17] LONG Mingsheng, CAO Zhangjie, WANG Jianmin, et al. Conditional adversarial domain adaptation[C]. The 32nd International Conference on Neural Information Processing Systems, Montréal, Canada, 2018: 1647–1657. doi: 10.5555/3326943.3327094. [18] CAI Ruichu, LI Zijian, WEI Pengfei, et al. Learning disentangled semantic representation for domain adaptation[C]. International Joint Conferences on Artificial Intelligence (IJCAI), Macao, China, 2019: 2060–2066. doi: 10.24963/ijcai.2019/285. [19] TSANAS A, LITTLE M A, MCSHARRY P E, et al. Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease[J]. IEEE Transactions on Biomedical Engineering, 2012, 59(5): 1264–1271. doi: 10.1109/TBME.2012.2183367. [20] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]. Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, USA, 2017: 6000–6010. doi: 10.5555/3295222.3295349. [21] OROZCO-ARROYAVE J R, VÁSQUEZ-CORREA J C, VARGAS-BONILLA J F, et al. NeuroSpeech: An open-source software for Parkinson’s speech analysis[J]. Digital Signal Processing, 2018, 77: 207–221. doi: 10.1016/j.dsp.2017.07.004. [22] CAI D, HE X, HAN J, et al. Orthogonal Laplacianfaces for face recognition[J]. IEEE Transactions on Image Processing, 2006, 15(11): 3608–3614. doi: 10.1109/TIP.2006.881945. [23] BOUSMALIS K, TRIGEORGIS G, SILBERMAN N, et al. Domain separation networks[C]. The 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 2016: 343–351. doi: 10.5555/3157096.3157135. [24] LI Yiyang, WANG Shengsheng, WANG Bilin, et al. Transferable feature filtration network for multi-source domain adaptation[J]. Knowledge-Based Systems, 2023, 260: 110113. doi: 10.1016/J.KNOSYS.2022.110113. [25] SONG L, SMOLA A, GRETTON A, et al. Supervised feature selection via dependence estimation[C]. The 24th International Conference on Machine Learning, 2007: 823–830. doi: 10.1145/1273496.1273600.