高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种基于属性空间相似性的模糊聚类算法

施伟锋 卓金宝 兰莹

施伟锋, 卓金宝, 兰莹. 一种基于属性空间相似性的模糊聚类算法[J]. 电子与信息学报, 2019, 41(11): 2722-2728. doi: 10.11999/JEIT180974
引用本文: 施伟锋, 卓金宝, 兰莹. 一种基于属性空间相似性的模糊聚类算法[J]. 电子与信息学报, 2019, 41(11): 2722-2728. doi: 10.11999/JEIT180974
Weifeng SHI, Jinbao ZHUO, Ying LAN. A Novel Fuzzy Clustering Algorithm Based on Similarity of Attribute Space[J]. Journal of Electronics & Information Technology, 2019, 41(11): 2722-2728. doi: 10.11999/JEIT180974
Citation: Weifeng SHI, Jinbao ZHUO, Ying LAN. A Novel Fuzzy Clustering Algorithm Based on Similarity of Attribute Space[J]. Journal of Electronics & Information Technology, 2019, 41(11): 2722-2728. doi: 10.11999/JEIT180974

一种基于属性空间相似性的模糊聚类算法

doi: 10.11999/JEIT180974
基金项目: 国家自然科学基金(61503240),上海海事大学研究生创新基金(2016ycx078)
详细信息
    作者简介:

    施伟锋:男,1963年生,博士,教授,主要研究方向为电力系统自动化

    卓金宝:男,1991年生,博士生,研究方向为智能故障诊断与预测

    兰莹:女,1985年生,博士,讲师,研究方向为多自主体与混杂系统研究

    通讯作者:

    施伟锋 wfshi@shmtu.edu.cn

  • 中图分类号: TP391

A Novel Fuzzy Clustering Algorithm Based on Similarity of Attribute Space

Funds: The National Natural Science Foundation of China (61503240), Shanghai Maritime University Graduate Student Innovation Fund Project (2016ycx078)
  • 摘要: 模糊C均值(FCM)聚类算法及其相关改进算法基于最大模糊隶属度原则确定聚类结果,没有充分利用迭代后的模糊隶属度矩阵和簇类中心的样本属性特征信息,影响聚类准确度。针对这个问题,该文提出一种新的改进思路:改进FCM算法输出定类原则。给出二元属性拓扑子空间中属性相似度的定义,最终提出一种基于属性空间相似性的改进FCM算法(FCM-SAS):首先,选择FCM算法聚类后模糊隶属度低于聚类置信度的样本作为存疑样本;然后,计算存疑样本与聚类后聚类中心的属性相似度;最后,基于最大属性相似度原则更新存疑样本的簇类标签。通过UCI数据集实验,证明算法不仅有效,还较一些基于最大模糊隶属度原则定类的改进算法具有更优的聚类评价指标。
  • 表  1  FCM-SAS算法具体步骤

    输入:样本集${\text{X}}$、样本数$n$,聚类个数$c$、加权指数$m$、迭代阈
    值$\varepsilon $、最大迭代次数T、聚类存疑率$\xi $、属性占比率$\kappa $;
    输出:样本标签集${{\text{X}}_l}'$;
    1表1的传统FCM算法步骤得到迭代后模糊隶属度矩阵${\text{U}}$、簇类中心${\text{V}}$和样本标签集${{\text{X}}_l}$,令$j = 0$;
    2计算所有样本点${\text{x}}$的模糊隶属度最大值,按递增顺序排序并组成数组,选出此数组中第$\left[ {\xi \times n} \right]$个元素的数值作为聚类置信度$\eta $;
    3令$j = j + 1$,判断第$j$个样本的模糊隶属度$\max \left( {\left\{ {{u_{ij}}\left| {i = 1, 2, ·\!·\!· , c} \right.} \right\}} \right)$是否不大于聚类置信度$\eta $,若是则此样本为存疑样本,转步骤4;否则转步骤8;
    4按式(8)计算第$j$个样本与各个簇类中心在2元属性拓扑子空间中的拓扑相似度集合$\gamma \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)$,将所有集合中的元素取绝对值后按递增的顺序排序并组成数组,计算此数组中第$\left[ {n \times {\gamma _{dis}} \times \kappa } \right]$个元素与数值1之间差的绝对值作为邻域半径$\delta $;
    5以$\delta $为邻域半径,按式(10)计算第$j$个样本与各个簇类中心的属性相似度$\psi \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)$;
    6若最大属性相似度$\max \left( {\psi \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)} \right)$只有一个,则选出最大属性相似度时的簇类中心所在的类别作为此样本更新后的标签$x_{lj}'$,转步骤8;否则,转步骤7;
    7若最大属性相似度不止一个,则选择这些簇类中最大拓扑相似度集合之和$\widehat S = \max \left( {\Sigma \gamma \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)} \right)$时的簇类中心所在的类别作为$x_{lj}'$;
    8判断$j < n$,若是则转步骤3,否则输出更新后的样本标签集${{\text{X}}_l}'$。
    下载: 导出CSV

    表  2  算法输入参数设置

    参数数值
    加权指数$m$2
    迭代阈值$\varepsilon $${10^{ - 3}}$
    最大迭代次数$T\;$$100$
    聚类存疑率$\xi $$0.3$
    属性占比率$\kappa $$0.5$
    下载: 导出CSV

    表  3  UCI数据集的统计描述

    数据集样本数维数簇类各类占比
    Iris1504350:50:50
    Wine17813359:71:48
    Seeds2107370:70:70
    Breast68392444:239
    Glass2149670:17:76:13:9:29
    下载: 导出CSV

    表  4  UCI数据集聚类结果评价指标对比(1)

    FCMRL-FCMRCAWFCMFRCMFCM-SAS(标准化样本)FCM-SAS(未标准化样本)
    IrisAR0.8930.9070.9670.9570.9600.9870.953
    RI0.8800.8920.9580.9340.9520.9830.942
    NMI0.7500.8310.8730.9490.8498
    SeedsAR0.8950.8950.9030.8950.8950.9190.900
    RI0.8740.8740.8840.8730.8760.8990.877
    NMI0.6950.6770.6970.7170.671
    BreastAR0.9370.9530.6550.9380.9470.9650.946
    RI0.8760.9100.5480.8840.9110.9320.897
    NMI0.7300.7360.7550.7820.688
    下载: 导出CSV

    表  5  UCI数据集聚类结果评价指标对比(2)

    FCMPSO-IFCMGA-IFCMABC-IFCMKFCMWGFCMFCM-SAS(标准化样本)FCM-SAS(未标准化样本)
    IrisAR0.8930.8070.8490.7870.8950.9730.9870.953
    WineAR0.9490.6550.6520.6420.9420.9660.9550.781
    GlassAR0.4210.4190.3930.4670.4600.7330.5330.472
    下载: 导出CSV

    表  6  FCM-SAS算法聚类过程统计数据

    样本集1次定类错误样本数1次定类正确的存疑样本数1次定类错误的存疑样本数存疑样本数2次定类正确样本数2次定类错误样本数
    Iris16291645432
    Seeds214419634815
    Breast43173272001919
    Wine945853476
    Glass1262142634518
    下载: 导出CSV
  • BEZDEK J C. Pattern Recognition with Fuzzy Objective Function Algorithms[M]. Boston: Springer, 1981: 155–201.
    FRIGUI H and KRISHNAPURAM R. A robust competitive clustering algorithm with applications in computer vision[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(5): 450–465. doi: 10.1109/34.765656
    YANG M and NATALIANI Y. Robust-learning fuzzy C-means clustering algorithm with unknown number of clusters[J]. Pattern Recognition, 2017, 71: 45–59. doi: 10.1016/j.patcog.2017.05.017
    SON L H and TIEN N D. Tune up fuzzy C-means for big data: Some novel hybrid clustering algorithms based on initial selection and incremental clustering[J]. International Journal of Fuzzy Systems, 2017, 19(5): 1585–1602. doi: 10.1007/s40815-016-0260-3
    HUANG Chengquan, CHUNG F, and WANG Shitong. Generalized competitive agglomeration clustering algorithm[J]. International Journal of Machine Learning and Cybernetics, 2017, 8(6): 1945–1969. doi: 10.1007/s13042-016-0572-5
    SINGH C and BALA A. A DCT-based local and non-local fuzzy C-means algorithm for segmentation of brain magnetic resonance images[J]. Applied Soft Computing, 2018, 68: 447–457. doi: 10.1016/j.asoc.2018.03.054
    SAHA A and DAS S. Geometric divergence based fuzzy clustering with strong resilience to noise features[J]. Pattern Recognition Letters, 2016, 79: 60–67. doi: 10.1016/j.patrec.2016.04.013
    肖满生, 肖哲, 文志诚, 等. 一种空间相关性与隶属度平滑的FCM改进算法[J]. 电子与信息学报, 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710

    XIAO Mansheng, XIAO Zhe, WEN Zhicheng, et al. Improved fcm clustering algorithm based on spatial correlation and membership smoothing[J]. Journal of Electronics &Information Technology, 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710
    WANG Xizhao, WANG Yadong, and WANG Lijuan. Improving fuzzy C-means clustering based on feature-weight learning[J]. Pattern Recognition Letters, 2004, 25(10): 1123–1132. doi: 10.1016/j.patrec.2004.03.008
    YANG M S and NATALIANI Y. A feature-reduction fuzzy clustering algorithm based on feature-weighted entropy[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 817–835. doi: 10.1109/TFUZZ.2017.2692203
    KUO R J, LIN T C, ZULVIA F E, et al. A hybrid metaheuristic and kernel intuitionistic fuzzy c-means algorithm for cluster analysis[J]. Applied Soft Computing, 2018, 67: 299–308. doi: 10.1016/j.asoc.2018.02.039
    JIANG Zhaohui, LI Tingting, MIN Wengfang, et al. Fuzzy c-means clustering based on weights and gene expression programming[J]. Pattern Recognition Letters, 2017, 90: 1–7. doi: 10.1016/j.patrec.2017.02.015
    JIE Lilin, LIU Weidong, SUN Zheng, et al. Hybrid fuzzy clustering methods based on improved self-adaptive cellular genetic algorithm and optimal-selection-based fuzzy c-means[J]. Neurocomputing, 2017, 249: 140–156. doi: 10.1016/j.neucom.2017.03.068
    KIM E H, OH S K, and PEDRYCZ W. Reinforced hybrid interval fuzzy neural networks architecture: Design and analysis[J]. Neurocomputing, 2018, 303: 20–36. doi: 10.1016/j.neucom.2018.04.003
    DAGHER I. Fuzzy clustering using multiple Gaussian kernels with optimized-parameters[J]. Fuzzy Optimization and Decision Making, 2018, 17(2): 159–176. doi: 10.1007/s10700-017-9268-x
    王骏, 刘欢, 蒋亦樟, 等. 堆叠隐空间模糊C均值聚类算法[J]. 控制与决策, 2016, 31(9): 1671–1677. doi: 10.13195/j.kzyjc.2015.0768

    WANG Jun, LIU Huan, JIANG Yizhang, et al. Cascaded hidden space fuzzy C means clustering algorithm[J]. Control and Decision, 2016, 31(9): 1671–1677. doi: 10.13195/j.kzyjc.2015.0768
    LUO Xiong, XU Yang, WANG Weiping, et al. Towards enhancing stacked extreme learning machine with sparse autoencoder by correntropy[J]. Journal of the Franklin Institute, 2018, 355(4): 1945–1966. doi: 10.1016/j.jfranklin.2017.08.014
    DAI Jianhua, HU Qinghua, HU Hu, et al. Neighbor inconsistent pair selection for attribute reduction by rough set approach[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 937–950. doi: 10.1109/TFUZZ.2017.2698420
    BU Fanyu. An efficient fuzzy c-means approach based on canonical polyadic decomposition for clustering big data in IoT[J]. Future Generation Computer Systems, 2018, 88: 675–682. doi: 10.1016/j.future.2018.04.045
    MACIEJEWSKI R, JANG Y, WOO I, et al. Abstracting attribute space for transfer function exploration and design[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(1): 94–107. doi: 10.1109/TVCG.2012.105
    李保珍, 张亭亭. 成对属性关联分析及其属性空间构建[J]. 情报学报, 2014, 33(11): 1194–1203.

    LI Baozhen and ZHANG Tingting. Association analysis of pairwise attributes and construction of attribute space[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(11): 1194–1203.
    WEI Cuiping, WANG Pei, and ZHANG Yuzhong. Entropy, similarity measure of interval-valued intuitionistic fuzzy sets and their applications[J]. Information Sciences, 2011, 181(19): 4273–4286. doi: 10.1016/j.ins.2011.06.001
    BACHE K and LICHMAN M. UCI irvine machine learning repository[EB/OL]. Irvine, CA: University of California, School of Information and Computer Science, http://archive.ics.uci.edu/ml2013.
    RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the American Statistical Association, 1971, 66(336): 846–850. doi: 10.1080/01621459.1971.10482356
    COOMBS C H, DAWES R M, and TVERSKY A. Mathematical Psychology: An Elementary Introduction[M]. Englewood Cliffs, USA: Prentice-Hall, 1970: 391–406.
    ZHANG Daoqiang and CHEN Songcan. Clustering incomplete data using kernel-based fuzzy C-means algorithm[J]. Neural Processing Letters, 2003, 18(3): 155–162. doi: 10.1023/B:NEPL.0000011135.19145.1b
  • 加载中
表(6)
计量
  • 文章访问数:  3094
  • HTML全文浏览量:  1210
  • PDF下载量:  79
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-10-17
  • 修回日期:  2019-02-28
  • 网络出版日期:  2019-04-25
  • 刊出日期:  2019-11-01

目录

    /

    返回文章
    返回