高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于自适应松弛的鲁棒模糊C均值聚类算法

高云龙 王志豪 潘金艳 罗斯哲 王德鑫

高云龙, 王志豪, 潘金艳, 罗斯哲, 王德鑫. 基于自适应松弛的鲁棒模糊C均值聚类算法[J]. 电子与信息学报, 2020, 42(7): 1774-1781. doi: 10.11999/JEIT190556
引用本文: 高云龙, 王志豪, 潘金艳, 罗斯哲, 王德鑫. 基于自适应松弛的鲁棒模糊C均值聚类算法[J]. 电子与信息学报, 2020, 42(7): 1774-1781. doi: 10.11999/JEIT190556
Yunlong GAO, Zhihao WANG, Jinyan PAN, Sizhe LUO, Dexin WANG. Robust Fuzzy C-Means Based on Adaptive Relaxation[J]. Journal of Electronics & Information Technology, 2020, 42(7): 1774-1781. doi: 10.11999/JEIT190556
Citation: Yunlong GAO, Zhihao WANG, Jinyan PAN, Sizhe LUO, Dexin WANG. Robust Fuzzy C-Means Based on Adaptive Relaxation[J]. Journal of Electronics & Information Technology, 2020, 42(7): 1774-1781. doi: 10.11999/JEIT190556

基于自适应松弛的鲁棒模糊C均值聚类算法

doi: 10.11999/JEIT190556
基金项目: 国家自然科学基金(61203176),福建省自然科学基金(2013J05098, 2016J01756)
详细信息
    作者简介:

    高云龙:男,1979年生,副教授,主要研究方向为机器学习、时间序列分析和生产制造系统优化和调度

    王志豪:男,1993年生,硕士生,研究方向为机器学习和模式识别

    潘金艳:女,1978年生,副教授,主要研究方向为人工智能和机器学习理论与方法

    罗斯哲:男,1995年生,硕士生,研究方向为模式识别和维数约简

    通讯作者:

    王志豪 zhwang@stu.xmu.edu.cn

  • 中图分类号: TP391; TP273

Robust Fuzzy C-Means Based on Adaptive Relaxation

Funds: The National Natural Science Foundation of China (61203176), The Provincial Natural Science Foundation of Fujian Province (2013J05098, 2016J01756)
  • 摘要:

    噪声是影响聚类结果的最重要的因素之一,现有的模糊聚类算法主要通过对隶属度约束进行松弛的方式来降低噪声样本的影响。这种方式仍然存在两个基本问题需要解决:第一,如何评估一个样本是噪声的可能性;第二,如何在抑制噪声样本影响力的同时,保留正常样本的作用力。针对这两问题,该文提出了基于自适应松弛的鲁棒模糊C均值聚类算法(AR-RFCM)。新模型基于K最近邻的方式(KNN)来估计样本的可靠性,自适应地调整松弛参数,从而实现在降低噪声样本影响力的同时,保留可靠样本的作用力。此外,AR-RFCM利用了C均值聚类模型中隶属度的稀疏性来提高可靠样本的作用力,从而提高数据簇的内聚程度,进而降低噪声样本的影响。实验表明,AR-RFCM不仅在处理噪声样本时具有良好的鲁棒性,同时在25个UCI 数据集实验中,分类正确率(兰德指数)平均高于FCM算法7.7864%。

  • 图  1  不同聚类算法的隶属度函数

    图  2  不同聚类算法的隶属度函数

    图  3  不同聚类算法的隶属度函数

    表  1  各算法求得的隶属度值

    样本编号FCMPCMNCREFCMFDCM_SSRAR-RFCM
    C#1C#2C#1C#2C#1C#2C#1C#2C#1C#2C#1C#2
    10.0540.9460.0150.5390.0040.332000.0950.90500.984
    20.0080.9920.0180.5460.0050.332000.0600.94000.985
    30.0400.9600.0190.99901.00000.9920.0900.91001.000
    40.0930.9070.0180.5450.0050.332000.1370.86300.985
    50.0550.9450.0240.5520.0070.331000.1140.88600.985
    60.5000.5000.0030.0030.0010.001000.5000.50000
    70.5000.5000.0100.0100.0040.004000.5000.50000
    80.9450.0550.5520.0240.3310.007000.8860.1140.9850
    90.9920.0080.5460.0180.3320.005000.9410.0590.9850
    100.9600.0400.9990.0191.00000.99200.9100.0901.0000
    110.9070.0930.5450.0180.3320.005000.8630.1370.9850
    120.9460.0540.5390.0150.3320.004000.9050.0950.9840
    下载: 导出CSV

    表  2  各算法所得的聚类簇中心

    簇中心1簇中心2
    FCM$\left[ \begin{aligned} & { {\rm{3} }{\rm{.9870} } } \\ & { {\rm{0} }{\rm{.0011} } } \end{aligned} \right]$$\left[ \begin{aligned}& { {\rm{ - 3} }{\rm{.9870} } } \\ &\ \ \, { {\rm{0} }{\rm{.0011} } } \end{aligned} \right]$
    PCM$\left[ \begin{aligned}& { {\rm{3} }{\rm{.9870} } } \\ & { {\rm{0} }{\rm{.0011} } } \end{aligned} \right]$$\left[ \begin{aligned}& { {\rm{ - 3} }{\rm{.9870} } } \\ & \ \ \, { {\rm{0} }{\rm{.0011} } } \end{aligned} \right]$
    NC$\left[ \begin{aligned}& { {\rm{3} }{\rm{.9996} } } \\ & { {\rm{0} }{\rm{.0002} } } \end{aligned} \right]$$\left[ \begin{aligned}& { {\rm{ - 3} }{\rm{.9996} } } \\ & \ \ \, { {\rm{0} }{\rm{.0002} } } \end{aligned} \right]$
    REFCM$\left[ \begin{aligned} & { {\rm{4} }{\rm{.000} }0} \\ & { {\rm{0} }{\rm{.0000} } } \end{aligned} \right]$$\left[ \begin{aligned}& { {\rm{ - 4} }{\rm{.000} }0} \\ &\ \ \, { {\rm{0} }{\rm{.0000} } } \end{aligned} \right]$
    FDCM_SSR$\left[ \begin{aligned}& { {\rm{3} }{\rm{.4833} } } \\ & { {\rm{1} }{\rm{.6530} } } \end{aligned} \right]$$\left[ \begin{aligned}& { {\rm{ - 3} }{\rm{.4833} } } \\ &\ \ \, { {\rm{1} }{\rm{.6530} } } \end{aligned} \right]$
    AR-RFCM$\left[ \begin{aligned}& { {\rm{3} }{\rm{.9999} } } \\ & { {\rm{0} }{\rm{.0000} } } \end{aligned} \right]$$\left[ \begin{aligned}& { {\rm{ - 3} }{\rm{.9999} } } \\ &\ \ \, { {\rm{0} }{\rm{.0000} } } \end{aligned} \right]$
    下载: 导出CSV

    表  3  噪声数据集信息

    均值协方差
    簇1[1 2]$\left[ {\begin{array}{*{20}{c}}2&{0.2}\\{0.2}&2\end{array}} \right]$
    簇2[–1 –2]$\left[ {\begin{array}{*{20}{c}}2&0\\0&1\end{array}} \right]$
    簇3[–3 –5]$\left[ {\begin{array}{*{20}{c}}3&0\\0&1\end{array}} \right]$
    噪声[–3 5]$\left[ {\begin{array}{*{20}{c}}{10}&0\\0&2\end{array}} \right]$
    下载: 导出CSV

    表  4  噪声数据集实验结果

    FCMPCMNCREFCMFDCM_SSRAR-RFCM
    准确率0.86830.86830.87670.85830.86670.8833
    精确度0.67750.67750.69000.66250.67500.7000
    灵敏度0.90330.90330.92000.88330.90000.9333
    特异度0.85670.85670.86220.85000.85560.8667
    下载: 导出CSV

    表  5  不同聚类算法在UCI数据集的聚类结果的RI(%)

    FCMPCMNCREFCMFDCM_SSRAR-RFCM
    Ecoli79.66±0.4778.60±3.2878.25±1.1180.12±1.1379.83±0.4588.40±1.62
    Auto-mpg76.23±075.21±4.0277.27±0.4675.64±085.62±0.3077.64±2.09
    Dermatology83.49±1.1881.17±3.2669.82±6.6181.33±5.4984.88±1.5992.55±0
    Iris83.68±079.33±7.9082.27±4.2385.68±087.37±085.68±0
    Zoo87.91±1.7687.56±6.1588.90±5.8285.76±5.2289.78±2.3396.65±0
    Transfusion53.10±055.47±5.5956.30±3.7553.16±063.90±0.0963.68±0
    Parkinsons52.19±058.86±5.8664.40±5.3160.27±7.7463.12±0.5773.83±0
    Banknote51.52±059.55±9.7561.32±6.8959.68±12.1252.44±1.7366.32±2.79
    Creadit-approval67.57±0.3755.78±5.8264.89±053.96±6.5367.51±068.06±0
    Breast-cancer90.53±078.42±14.2586.24±16.9791.59±094.31±094.86±0
    Wine95.43±073.35±4.5791.85±6.6290.36±095.43±096.40±0.73
    Automobile71.83±0.4470.81±1.7872.08±1.6271.79±0.3871.99±0.2974.24±0.29
    Messidor Features50.49±050.62±0.3550.63±0.0150.86±050.65±0.5351.80±0.61
    Fertility50.00±065.74±12.9350.08±0.1857.99±6.5579.13±0.7178.85±1.75
    Seeds89.92±078.70±5.2888.53±0.5287.06±089.92±091.26±0.52
    Blance58.82±4.5858.49±5.7062.66±4.6960.55±5.0960.56±4.0765.97±3.54
    House Votes77.52±071.46±8.6175.49±6.0772.41±6.7677.86±078.90±0
    Vowel67.15±2.4782.68±1.6153.52±4.4482.91±0.9466.81±1.8185.47±0.20
    Glass71.31±0.4569.29±1.1971.58±0.9871.07±0.8671.59±0.3972.45±0.77
    Mammographic67.96±064.00±7.0567.98±0.0462.35±6.0268.55±068.11±0
    Pima Indians Diabetes59.07±056.23±3.2752.41±0.1358.09±059.06±0.0359.18±0.29
    Qualitative Bankruptcy94.53±074.58±16.7297.62±094.53±094.53±097.62±0
    Seismic Bumps51.88±071.15±12.5356.19±10.2987.70±087.69±0.1087.70±0
    Phishing Data68.72±0.2960.73±5.1869.13±0.2862.71±068.93±0.3969.67±0.09
    Yeast65.83±1.6172.95±1.2163.44±2.9072.95±2.0666.58±1.5875.71±0.03
    下载: 导出CSV
  • JAIN A K. Data clustering: 50 years beyond k-means[J]. Pattern Recognition Letters, 2010, 31(8): 651–666. doi: 10.1016/j.patrec.2009.09.011
    DENG Zhaohong, JIANG Yizhang, CHUNG Fulai, et al. Transfer prototype-based fuzzy clustering[J]. IEEE Transactions on Fuzzy Systems, 2016, 24(5): 1210–1232. doi: 10.1109/TFUZZ.2015.2505330
    张洁玉, 李佐勇. 基于核空间的加权邻域约束直觉模糊聚类算法[J]. 电子与信息学报, 2017, 39(9): 2162–2168. doi: 10.11999/JEIT161317

    ZHANG Jieyu and LI Zuoyong. Kernel-based algorithm with weighted spatial information intuitionistic fuzzy c-means[J]. Journal of Electronics &Information Technology, 2017, 39(9): 2162–2168. doi: 10.11999/JEIT161317
    LV Yinghua, MA Tinghuai, TANG Meili, et al. An efficient and scalable density-based clustering algorithm for datasets with complex structures[J]. Neurocomputing, 2016, 171: 9–22. doi: 10.1016/j.neucom.2015.05.109
    QIN Xiaoyu, TING Kaiming, ZHU Ye, et al. Nearest-neighbour-induced isolation similarity and its impact on density-based clustering[C]. The 33rd AAAI Conference on Artificial Intelligence, Honolulu, USA, 2019: 4755–4762. doi: 10.1609/aaai.v33i01.33014755.
    赵小强, 刘晓丽. 基于公理化模糊子集的改进谱聚类算法[J]. 电子与信息学报, 2018, 40(8): 1904–1910. doi: 10.11999/IEIT170904

    ZHAO Xiaoqiang and LIU Xiaoli. An improved spectral clustering algorithm based on axiomatic fuzzy set[J]. Journal of Electronics &Information Technology, 2018, 40(8): 1904–1910. doi: 10.11999/IEIT170904
    YIN Hao, BENSON A R, LESKOVEC J, et al. Local higher-order graph clustering[C]. The 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, Canada, 2017: 555–564. doi: 10.1145/3097983.3098069.
    MOSLEHI Z, TAHERI M, MIRZAEI A, et al. Discriminative fuzzy c-means as a large margin unsupervised metric learning algorithm[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(6): 3534–3544. doi: 10.1109/TFUZZ.2018.2836338
    刘解放, 王士同, 王骏, 等. 一种具有最优保证特性的贝叶斯可能性聚类方法[J]. 电子与信息学报, 2017, 39(7): 1554–1562. doi: 10.11999/JEIT160908

    LIU Jiefang, WANG Shitong, WANG Jun, et al. Bayesian possibilistic clustering method with optimality guarantees[J]. Journal of Electronics &Information Technology, 2017, 39(7): 1554–1562. doi: 10.11999/JEIT160908
    MACQUEEN J. Some methods for classification and analysis of multivariate observations[C]. The 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, USA, 1965: 281–297.
    BEZDEK J C, EHRLICH R, and FULL W. FCM: The fuzzy c-means clustering algorithm[J]. Computers & Geosciences, 1984, 10(2/3): 191–203.
    DE OLIVEIRA J V and PEDRYCZ W. Advances in Fuzzy Clustering and its Applications[M]. Chichester: John Wiley & Sons, Ltd., 2007. doi: 10.1002/9780470061190.
    DAVE R N. Characterization and detection of noise in clustering[J]. Pattern Recognition, 1991, 12(11): 657–664. doi: 10.1016/0167-8655(91)90002-4
    KRISHNAPURAM R and KELLER J M. A possibilistic approach to clustering[J]. IEEE Transactions on Fuzzy Systems, 1993, 1(2): 98–110. doi: 10.1109/91.227387
    ZARINBAL M, ZARANDI M H F, and TURKSEN I B. Relative entropy fuzzy c-means clustering[J]. Information Sciences, 2014, 260: 74–97. doi: 10.1016/j.ins.2013.11.004
    GU Jing, JIAO Licheng, YANG Shuyuan, et al. Fuzzy double c-means clustering based on sparse self-representation[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 612–626. doi: 10.1109/TFUZZ.2017.2686804
  • 加载中
图(3) / 表(5)
计量
  • 文章访问数:  2887
  • HTML全文浏览量:  2066
  • PDF下载量:  122
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-07-24
  • 修回日期:  2020-03-13
  • 网络出版日期:  2020-04-09
  • 刊出日期:  2020-07-23

目录

    /

    返回文章
    返回