融合预训练音频大模型与密度估计的水轮发电机组声学无监督异常检测

武亭; 闻疏琳; 阎兆立; 付高原; 李林峰; 刘绪都; 程晓斌; 杨军

doi:10.11999/JEIT250934

融合预训练音频大模型与密度估计的水轮发电机组声学无监督异常检测

doi: 10.11999/JEIT250934 cstr: 32379.14.JEIT250934

武亭^{1, 2, 4},
闻疏琳^{3, 4},
阎兆立^{5, 1},
付高原^{3, 4},
李林峰^{3, 4},
刘绪都^{3, 4},
程晓斌^{1, 2, ,},
杨军^{1, 2}

1.
中国科学院声学研究所声学与海洋信息全国重点实验室北京 100190
2.
中国科学院大学电子电气与通信工程学院北京 100049
3.
中国长江电力股份有限公司武汉 443000
4.
湖北省智慧水电技术创新中心武汉 430000
5.
北京化工大学机电工程学院北京 100029

基金项目: 中国长江电力股份有限公司项目(Z152302048)

详细信息

作者简介:
武亭：男，博士生，研究方向为声异常检测与无监督深度学习

闻疏琳：女，博士，研究方向为水电设备状态声学监测、故障诊断及预警、声学目标识别跟踪、主动降噪

阎兆立：男，博士，教授，博士生导师，研究方向为设备状态监测和故障诊断

付高原：男，硕士，研究方向为水电设备状态声学监测、故障诊断及预警

李林峰：男，硕士，研究方向为水电设备状态声学监测、故障诊断及预警、主动降噪

刘绪都：男，博士，研究方向为水电设备状态声学监测、故障诊断及预警、结构健康监测与安全评价

程晓斌：男，博士，教授，博士生导师，研究方向为声信号智能处理、大数据分析与声学事件监测

杨军：男，博士，教授，博士生导师，研究方向为声信号处理、阵列信号处理与声振控制

通讯作者:
程晓斌　xb_cheng@mail.ioa.ac.cn

中图分类号: TN912; TP183; TP391.41; TM315
计量
- 文章访问数: 157
- HTML全文浏览量: 87
- PDF下载量: 21
- 被引次数: 0
出版历程
- 收稿日期: 2025-09-19
- 修回日期: 2025-11-10
- 录用日期: 2025-11-12
- 网络出版日期: 2025-11-18
- 刊出日期: 2026-02-10

Unsupervised Anomaly Detection of Hydro-Turbine Generator Acoustics by Integrating Pre-Trained Audio Large Model and Density Estimation

WU Ting^{1, 2, 4},
WEN Shulin^{3, 4},
YAN Zhaoli^{5, 1},
FU Gaoyuan^{3, 4},
LI Linfeng^{3, 4},
LIU Xudu^{3, 4},
CHENG Xiaobin^{1, 2
, ,},
YANG Jun^{1, 2}

1.
State Key Laboratory of Acoustics and Marine Information, Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China
2.
School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
3.
China Yangtze Power Co., Ltd., Wuhan 443000, China
4.
Hubei Technology Innovation Center for Smart Hydropower, Wuhan 430000, China
5.
College of Mechanical and Electrical Engineering, Beijing University of Chemical Technology, Beijing 100029, China

Funds: The project of China Yangtze Power Co., Ltd. (Z152302048)

摘要

摘要: 水轮发电机组作为水电站的核心动力设备，其安全稳定运行对于整个水电站具有重要意义。近年来，非接触式声学测量作为一种有效的检测手段受到广泛关注，然而水轮发电机组的实际运行的异常声信号难以采集，传统异常检测方法及基于监督学习的分类策略在该领域的应用受到限制。针对上述挑战，该文提出一种预训练音频大模型与密度估计k近邻(k-NN)的水轮发电机声学无监督异常检测方法。首先验证了预训练音频模型提取的通用音频特征在异常检测中的有效性；随后设计了一种融合注意力统计池化与warm-up的参数微调策略，实现模型的迁移优化，在推理阶段设计了一种密度估计的k近邻实现鲁棒的距离度量。实验结果表明，该方法在风洞环境达到了98.7%的多指标调和平均数，在滑环室则高达99.9%，为水电站的声学异常检测提供了切实可行且性能优异的解决方案。
- 预训练音频大模型 /
- 水轮发电机组 /
- 异常检测 /
- 无监督深度学习
Abstract: Objective Hydro-Turbine Generator Units (HTGUs) require reliable early fault detection to maintain operational safety and reduce maintenance cost. Acoustic signals provide a non-intrusive and sensitive monitoring approach, but their use is limited by complex structural acoustics, strong background noise, and the scarcity of abnormal data. An unsupervised acoustic anomaly detection framework is presented, in which a large-scale pretrained audio model is integrated with density-based k-nearest neighbors estimation. This framework is designed to detect anomalies using only normal data and to maintain robustness and strong generalization across different operational conditions of HTGUs. Methods The framework performs unsupervised acoustic anomaly detection for HTGUs using only normal data. Time-domain signals are preprocessed with Z-score normalization and Fbank features, and random masking is applied to enhance robustness and generalization. A large-scale pretrained BEATs model is used as the feature encoder, and an Attentive Statistical Pooling module aggregates frame-level representations into discriminative segment-level embeddings by emphasizing informative frames. To improve class separability, an ArcFace loss replaces the conventional classification layer during training, and a warm-up learning rate strategy is adopted to ensure stable convergence. During inference, density-based k-nearest neighbors estimation is applied to the learned embeddings to detect acoustic anomalies. Results and Discussions The effectiveness of the proposed unsupervised acoustic anomaly detection framework for HTGUs is examined using data collected from eight real-world machines. As shown in Fig. 7 and Table 2, large-scale pretrained audio representations show superior capability compared with traditional features in distinguishing abnormal sounds. With the FED-KE algorithm, the framework attains high accuracy across six metrics, with Hmean reaching 98.7% in the wind tunnel and exceeding 99.9% in the slip-ring environment, indicating strong robustness under complex industrial conditions. As shown in Table 4, ablation studies confirm the complementary effects of feature enhancement, ASP-based representation refinement, and density-based k-NN inference. The framework requires only normal data for training, reducing dependence on scarce fault labels and enhancing practical applicability. Remaining challenges include computational cost introduced by the pretrained model and the absence of multimodal fusion, which will be addressed in future work. Conclusions An unsupervised acoustic anomaly detection framework is proposed for HTGUs, addressing the scarcity of fault samples and the complexity of industrial acoustic environments. A pretrained large-scale audio foundation model is adopted and fine-tuned with turbine-specific strategies to improve the modeling of normal operational acoustics. During inference, a density-estimation-based k-NN mechanism is applied to detect abnormal patterns using only normal data. Experiments conducted on real-world hydropower station recordings show high detection accuracy and strong generalization across different operating conditions, exceeding conventional supervised approaches. The framework introduces foundation-model-based audio representation learning into the hydro-turbine domain, provides an efficient adaptation strategy tailored to turbine acoustics, and integrates a robust density-based anomaly scoring mechanism. These components jointly reduce dependence on labeled anomalies and support practical deployment for intelligent condition monitoring. Future work will examine model compression, such as knowledge distillation, to enable on-device deployment, and explore semi-/self-supervised learning and multimodal fusion to enhance robustness, scalability, and cross-station adaptability.
- Pretrained audio models /
- Hydropower generating units /
- Anomaly detection /
- Unsupervised deep learning

HTML全文

图 1 FED-KE算法训练以及推理过程整体框架

下载: 全尺寸图片幻灯片

图 2 FED-KE网络结构

下载: 全尺寸图片幻灯片

图 3 水电站实验现场

下载: 全尺寸图片幻灯片

图 4 6种不同设备正常信号的时域波形图和短时谱图

下载: 全尺寸图片幻灯片

图 5 风洞以及滑环室异常信号波形图与短时谱图

下载: 全尺寸图片幻灯片

图 6 风洞以及滑环室测试样本BEATs通用特征的异常值得分图

下载: 全尺寸图片幻灯片

图 7 风洞以及滑环室测试样本FED-KE特征的异常值得分图

下载: 全尺寸图片幻灯片

图 8 风洞以及滑环室测试样本FED-KE特征的正常信号与异常信号特征可视化图

下载: 全尺寸图片幻灯片

表 1 6种设备的正常以及训练样本

设备	正常样本个数	异常样本个数
风洞	1890	68
滑环室	1386	736
上导轴承	122	-
水车室	2218	-
蜗壳门	1110	-
椎管门	2831	-

下载: 导出CSV

表 2 针对风洞与滑环室信号测试结果(%)

	算法	AUC	pAUC	Accuracy	Recall	F1-score	Precision	Hmean
信号处理类	MFCC	92.3 / 79.8	89.8 / 72.7	94.5 / 53.8	85.3 / 45.2	89.2 / 62.2	93.6 / 99.7	90.6 / 64.4
信号处理类	WPD	90.3 / 96.0	83.4 / 89.2	90.3 / 89.7	69.1 / 89.1	79.0 / 93.6	92.2 / 98.5	83.2 / 92.5
重构误差类	AE	72.3 / 57.0	51.3 / 48.3	77.9 / 69.6	95.6 / 99.3	81.3 / 76.6	70.7 / 62.3	72.2 / 65.4
监督学习类	IEFNet-B	99.2 / 97.4	95.6 / 86.1	98.5 / 97.2	100.0 / 98.7	82.4 / 96.1	70.0 / 93.6	89.4 / 94.7
	AlexNet	81.9 / 71.5	47.4 / 56.0	77.6 / 66.2	100.0 / 94.6	24.1 / 66.0	13.7 / 50.7	35.2 / 64.9
	ResNet34	98.6 / 97.0	92.8 / 84.3	96.9 / 97.7	100.0 / 100.0	70.0 / 96.7	53.9 / 93.7	81.0 / 94.6
	Xception	90.4 / 84.8	61.1 / 65.7	84.2 / 78.9	100.0 / 100.0	31.1 / 76.7	18.4 / 62.2	44.2 / 76.1
	1D ResNet18	94.9 / 98.4	73.4 / 91.3	93.9 / 96.7	100.0 / 98.7	53.9 / 95.4	36.8 / 92.4	66.3 / 95.4
深度特征提取	1D ResNet18	95.7 / 96.4	91.7 / 91.9	94.6 / 87.4	85.3 / 85.9	89.2 / 92.0	93.6 / 99.1	91.5 / 91.9
预训练音频大模型	BEATs	99.0 / 99.2	95.3 / 97.0	94.9 / 97.0	94.1 / 97.3	90.8 / 98.2	87.7 / 99.2	93.5 / 98.0
预训练音频大模型	FED-KE	99.9 / 100.0	99.8 / 100.0	98.8 / 99.8	100.0 / 99.7	97.8 / 99.9	95.8 / 100.0	98.7 / 99.9

下载: 导出CSV

表 3 风洞与滑环室信号在不同信噪比及混响环境下测试结果(%)

RT60(s)	SNR(dB)	AUC	pAUC	Accuracy	Recall	F1-score	Precision	Hmean
0.2	-	99.0 / 100.0	95.9 / 100.0	96.5 / 99.4	94.1 / 99.3	93.4 / 99.7	92.8 / 100.0	95.2 / 99.7
0.5	-	96.8 / 99.7	94.2 / 98.6	94.9 / 98.9	91.2 / 99.3	90.5 / 99.3	89.9 / 99.3	92.8 / 99.2
0.8	-	99.2 / 99.6	97.8 / 97.74	98.4 / 99.0	94.1 / 99.1	97.0 / 99.4	100.0 / 99.7	97.7 / 99.1
-	0	90.7 / 93.7	80.6 / 86.5	88.3 / 77.6	77.9 / 73.6	77.9 / 84.7	77.9 / 99.6	81.9 / 85.0
-	10	96.8 / 99.0	89.9 / 96.6	92.6 / 93.8	88.2 / 93.2	86.3 / 96.2	84.5 / 99.4	89.5 / 96.3
-	20	96.6 / 98.9	94.8 / 96.6	96.5 / 94.6	91.2 / 94.4	93.2 / 96.7	95.4 / 99.1	94.6 / 96.7

下载: 导出CSV

表 4 针对风洞与滑环室信号在额外4类正常信号、ASP层以及密度k-NN消融实验的测试结果(%)

4类数据	ASP	密度k-NN	AUC	pAUC	Accuracy	Recall	F1-score	Precision	Hmean
×	√	√	98.5 / 99.9	97.3 / 99.9	96.9 / 99.3	95.6 / 99.2	94.2 / 99.6	92.9 / 100.0	95.8 / 99.6
√	×	√	98.6 / 99.9	97.9 / 99.9	98.4 / 99.6	94.1 / 99.6	96.9 / 99.8	100.0 / 100.0	97.6 / 99.8
√	√	×	99.9 / 99.2	99.4 / 96.4	97.7 / 96.9	100.0 / 97.3	95.8 / 98.1	91.9 / 99.0	97.4 / 97.8
√	√	√	99.9 / 100.0	99.8 / 100.0	98.8 / 99.8	100.0 / 99.7	97.8 / 99.9	95.8 / 100.0	98.7 / 99.9

下载: 导出CSV

参考文献(29)

[1]	黄紫馨, 李佰霖, 付文龙. 采用PSOGSA算法的水电机组调节系统非线性鲁棒控制研究[J]. 水力发电学报, 2024, 43(6): 101–112. doi: 10.11660/slfdxb.20240610. HUANG Zixin, LI Bailin, and FU Wenlong. Study on nonlinear robust control of hydropower unit regulation system using PSOGSA algorithm[J]. Journal of Hydroelectric Engineering, 2024, 43(6): 101–112. doi: 10.11660/slfdxb.20240610.
[2]	YING Wanming, LI Lunyong, LI Yongbo, et al. Trustworthy multimodal feature-enhanced fusion network for non-contact rotating machinery fault diagnosis[J]. Information Fusion, 2025, 124: 103377. doi: 10.1016/j.inffus.2025.103377.
[3]	BECHARA H, IBRAHIM R, ZEMOURI R, et al. Review of artificial intelligence methods for faults monitoring, diagnosis, and prognosis in hydroelectric synchronous generators[J]. IEEE Access, 2024, 12: 173599–173617. doi: 10.1109/ACCESS.2024.3502546.
[4]	TANG Linjiang, WU Xing, WANG Dongxiao, et al. A comparative experimental study of vibration and acoustic emission on fault diagnosis of low-speed bearing[J]. IEEE Transactions on Instrumentation and Measurement, 2023, 72: 3529211. doi: 10.1109/TIM.2023.3312761.
[5]	XU Shuxian, DAO Fang, ZENG Yun, et al. Fault diagnosis of hydro-turbine runner based on improved masking signal method incorporate RLMD[J]. Applied Acoustics, 2025, 228: 110371. doi: 10.1016/j.apacoust.2024.110371.
[6]	LV Yanchun, XU Lingjiang, YIN Chengyi, et al. Overview of abnormal sound detection for hydroelectric generating units[C]. 2023 7th International Conference on Electrical, Mechanical and Computer Engineering (ICEMCE), Xi’an, China, 2023: 597–604. doi: 10.1109/ICEMCE60359.2023.10490498.
[7]	LIU Yi, XU Yanhe, LIU Jie, et al. Real-time comprehensive health status assessment of hydropower units based on multi-source heterogeneous uncertainty information[J]. Measurement, 2023, 216: 112979. doi: 10.1016/j.measurement.2023.112979.
[8]	钟卫华, 张健, 徐衡, 等. 基于归一化流概率模型的水电机组异常声音检测[J]. 中国农村水利水电, 2024(1): 237–243,256. doi: 10.12396/znsd.230476. ZHANG Weihua, ZHOU Jian, XU Heng, et al. Abnormal sound detection of hydropower units based on normalized flow probability model[J]. China Rural Water and Hydropower, 2024(1): 237–243,256. doi: 10.12396/znsd.230476.
[9]	LUO Jian, WANG Xinyang, and XU Yonggan. Vibration fault diagnosis for hydroelectric generating unit based on generalized S-transform and QPSO-SVM[C]. 2019 IEEE Sustainable Power and Energy Conference (iSPEC), Beijing, China, 2019: 2133–2137. doi: 10.1109/iSPEC48194.2019.8975046.
[10]	XIAO Boyi, ZENG Yun, HU Wenqing, et al. Feature extraction of flow sediment content of hydropower unit based on voiceprint signal[J]. Energies, 2024, 17(5): 1041. doi: 10.3390/en17051041.
[11]	BERNIER S, MERKHOUF A, and AL-HADDAD K. Diagnosis of multiple defects within large hydroelectric generator using stray flux and air gap (distance and flux) measurements[J]. IEEE Transactions on Industry Applications, 2024, 60(6): 8687–8700. doi: 10.1109/TIA.2024.3441519.
[12]	HE Shengming, WANG Zhaocheng, LIAO Bo, et al. Anomaly detection of hydro-turbine based on audio feature extraction of deep convolutional neural network[J]. International Journal of Computer Applications in Technology, 2023, 73(3): 192–202. doi: 10.1504/IJCAT.2023.135584.
[13]	董书琴, 张斌. 基于深度特征学习的网络流量异常检测方法[J]. 电子与信息学报, 2020, 42(3): 695–703. doi: 10.11999/JEIT190266. DONG Shuqin and ZHANG Bin. Network traffic anomaly detection method based on deep features learning[J]. Journal of Electronics & Information Technology, 2020, 42(3): 695–703. doi: 10.11999/JEIT190266.
[14]	SUJATHA V. Investigation on Machine learning based fault detection and estimation in hydro turbines of industrial hydro power plant[J]. Measurement, 2025, 247: 116852. doi: 10.1016/j.measurement.2025.116852.
[15]	XU Xiong, DENG Jiazeng, LIN Haijun, et al. Lightweight anomalous detection of hydro turbine operation sound using fusion network enhanced by load information[J]. IEEE Transactions on Instrumentation and Measurement, 2025, 74: 9600213. doi: 10.1109/TIM.2025.3533632.
[16]	ZHAO Weiqiang, EGUSQUIZA M, VALERO C, et al. On the use of artificial neural networks for condition monitoring of pump-turbines with extended operation[J]. Measurement, 2020, 163: 107952. doi: 10.1016/j.measurement.2020.107952.
[17]	WANG Hongteng, LIU Xuewei, MA Liyong, et al. Anomaly detection for hydropower turbine unit based on variational modal decomposition and deep autoencoder[J]. Energy Reports, 2021, 7: 938–946. doi: 10.1016/j.egyr.2021.09.179.
[18]	IBRAHIM R, ZEMOURI R, KEDJAR B, et al. Non-invasive detection of rotor inter-turn short circuit of a hydrogenerator using AI-based variational autoencoder[J]. IEEE Transactions on Industry Applications, 2024, 60(1): 28–37. doi: 10.1109/TIA.2023.3281311.
[19]	IBRAHIM R, ZEMOURI R, TAHAN A, et al. Early detection of rotor faults in large hydrogenerators using vibration measurements, variational autoencoder, and Euclidean distance[J]. IEEE Transactions on Industry Applications, 2025, 61(6): 9023–9032. doi: 10.1109/TIA.2025.3571883.
[20]	郭铁峰, 贺建军, 申帅, 等. 基于动态规整与改进变分自编码器的异常电池在线检测方法[J]. 电子与信息学报, 2024, 46(2): 738–747. doi: 10.11999/JEIT230084. GUO Tiefeng, HE Jianjun, SHEN Shuai, et al. Abnormal battery on-line detection method based on dynamic time warping and improved variational auto-encoder[J]. Journal of Electronics & Information Technology, 2024, 46(2): 738–747. doi: 10.11999/JEIT230084.
[21]	陈欣, 李紫薇, 张卫君, 等. 深度学习在水电机组故障诊断中的应用与研究[J]. 水电站机电技术, 2024, 47(12): 86–89. doi: 10.13599/j.cnki.11-5130.2024.12.024. CHEN Xin, LI Ziwei, ZHANG Weijun, et al. Application and research of deep learning in fault diagnosis of hydropower units[J]. Mechanical & Electrical Technique of Hydropower Station, 2024, 47(12): 86–89. doi: 10.13599/j.cnki.11-5130.2024.12.024.
[22]	张晨旭, 李圣辰, 邵曦. 基于自编码器的无监督机器异常声检测[J]. 复旦学报: 自然科学版, 2021, 60(3): 297–302. doi: 10.15943/j.cnki.fdxb-jns.2021.03.004. ZHANG Chenxu, LI Shengchen, and SHAO Xi. Unsupervised detection of anomalous sounds for machine based on auto-encoder[J]. Journal of Fudan University: Natural Science, 2021, 60(3): 297–302. doi: 10.15943/j.cnki.fdxb-jns.2021.03.004.
[23]	WILKINGHOFF K. Self-supervised learning for anomalous sound detection[C]. The ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024: 276–280. doi: 10.1109/ICASSP48485.2024.10447156.
[24]	LI Xian, SHAO Nian, and LI Xiaofei. Self-supervised audio teacher-student transformer for both clip-level and frame-level tasks[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024, 32: 1336–1351. doi: 10.1109/TASLP.2024.3352248.
[25]	NIIZUMI D, TAKEUCHI D, OHISHI Y, et al. Masked modeling duo: Towards a universal audio pre-training framework[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024, 32: 2391–2406. doi: 10.1109/TASLP.2024.3389636.
[26]	SRIVASTAVA S and SHARMA G. OmniVec2-a novel transformer based network for large scale multimodal and multitask learning[C]. 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, USA: 27402–27414. doi: 10.1109/CVPR52733.2024.02588.
[27]	CHEN Sanyuan, WU Yu, WANG Chengyi, et al. BEATs: Audio pre-training with acoustic tokenizers[C]. The 40th International Conference on Machine Learning, Honolulu, USA, 2023: 5178–5193.
[28]	DENG Jiankang, GUO Jia, XUE Niannan, et al. ArcFace: Additive angular margin loss for deep face recognition[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, USA, 2019: 4685–4694. doi: 10.1109/CVPR.2019.00482.
[29]	XU Xiong, WEN He, LIN Haijun, et al. Online detection method for variable load conditions and anomalous sound of hydro turbines using correlation analysis and PCA-adaptive-K-means[J]. Measurement, 2024, 224: 113846. doi: 10.1016/j.measurement.2023.113846.