A Novel Fuzzy Clustering Algorithm Based on Similarity of Attribute Space
-
摘要: 模糊C均值(FCM)聚类算法及其相关改进算法基于最大模糊隶属度原则确定聚类结果,没有充分利用迭代后的模糊隶属度矩阵和簇类中心的样本属性特征信息,影响聚类准确度。针对这个问题,该文提出一种新的改进思路:改进FCM算法输出定类原则。给出二元属性拓扑子空间中属性相似度的定义,最终提出一种基于属性空间相似性的改进FCM算法(FCM-SAS):首先,选择FCM算法聚类后模糊隶属度低于聚类置信度的样本作为存疑样本;然后,计算存疑样本与聚类后聚类中心的属性相似度;最后,基于最大属性相似度原则更新存疑样本的簇类标签。通过UCI数据集实验,证明算法不仅有效,还较一些基于最大模糊隶属度原则定类的改进算法具有更优的聚类评价指标。Abstract: With the attribute feature information of the fuzzy membership matrix and cluster centers after the iteration not fully utilized, the results of Fuzzy C-Means (FCM) Clustering and related modified algorithms are determined based on the principle of maximum fuzzy membership, causing bad influence on the clustering accuracy. To solve this problem, the improvement ideas are proposed: to improve classification principle of FCM. The formula definition of attribute similarity in binary topological subspaces is given. Then, the improved FCM algorithm based on the Similarity of Attribute Space (FCM-SAS) is proposed: First, samples with fuzzy membership degree lower than the clustering reliability are selected as suspicious samples. Next, the attribute similarity between the suspicious samples and the cluster centers after clustering are calculated. Finally, cluster labels of suspicious samples based on the principle of maximum attribute similarity are updated. The validity and superiority of the proposed algorithm is verified by the UCI sample set experiments and comparisons with other modified algorithms based on the principle of maximum fuzzy membership.
-
表 1 FCM-SAS算法具体步骤
输入: 样本集${\text{X}}$、样本数$n$,聚类个数$c$、加权指数$m$、迭代阈
值$\varepsilon $、最大迭代次数T、聚类存疑率$\xi $、属性占比率$\kappa $;输出: 样本标签集${{\text{X}}_l}'$; 1 按表1的传统FCM算法步骤得到迭代后模糊隶属度矩阵${\text{U}}$、簇类中心${\text{V}}$和样本标签集${{\text{X}}_l}$,令$j = 0$; 2 计算所有样本点${\text{x}}$的模糊隶属度最大值,按递增顺序排序并组成数组,选出此数组中第$\left[ {\xi \times n} \right]$个元素的数值作为聚类置信度$\eta $; 3 令$j = j + 1$,判断第$j$个样本的模糊隶属度$\max \left( {\left\{ {{u_{ij}}\left| {i = 1, 2, ·\!·\!· , c} \right.} \right\}} \right)$是否不大于聚类置信度$\eta $,若是则此样本为存疑样本,转步骤4;否则转步骤8; 4 按式(8)计算第$j$个样本与各个簇类中心在2元属性拓扑子空间中的拓扑相似度集合$\gamma \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)$,将所有集合中的元素取绝对值后按递增的顺序排序并组成数组,计算此数组中第$\left[ {n \times {\gamma _{dis}} \times \kappa } \right]$个元素与数值1之间差的绝对值作为邻域半径$\delta $; 5 以$\delta $为邻域半径,按式(10)计算第$j$个样本与各个簇类中心的属性相似度$\psi \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)$; 6 若最大属性相似度$\max \left( {\psi \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)} \right)$只有一个,则选出最大属性相似度时的簇类中心所在的类别作为此样本更新后的标签$x_{lj}'$,转步骤8;否则,转步骤7; 7 若最大属性相似度不止一个,则选择这些簇类中最大拓扑相似度集合之和$\widehat S = \max \left( {\Sigma \gamma \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)} \right)$时的簇类中心所在的类别作为$x_{lj}'$; 8 判断$j < n$,若是则转步骤3,否则输出更新后的样本标签集${{\text{X}}_l}'$。 表 2 算法输入参数设置
参数 数值 加权指数$m$ 2 迭代阈值$\varepsilon $ ${10^{ - 3}}$ 最大迭代次数$T\;$ $100$ 聚类存疑率$\xi $ $0.3$ 属性占比率$\kappa $ $0.5$ 表 3 UCI数据集的统计描述
数据集 样本数 维数 簇类 各类占比 Iris 150 4 3 50:50:50 Wine 178 13 3 59:71:48 Seeds 210 7 3 70:70:70 Breast 683 9 2 444:239 Glass 214 9 6 70:17:76:13:9:29 表 4 UCI数据集聚类结果评价指标对比(1)
FCM RL-FCM RCA WFCM FRCM FCM-SAS(标准化样本) FCM-SAS(未标准化样本) Iris AR 0.893 0.907 0.967 0.957 0.960 0.987 0.953 RI 0.880 0.892 0.958 0.934 0.952 0.983 0.942 NMI 0.750 – – 0.831 0.873 0.949 0.8498 Seeds AR 0.895 0.895 0.903 0.895 0.895 0.919 0.900 RI 0.874 0.874 0.884 0.873 0.876 0.899 0.877 NMI 0.695 – – 0.677 0.697 0.717 0.671 Breast AR 0.937 0.953 0.655 0.938 0.947 0.965 0.946 RI 0.876 0.910 0.548 0.884 0.911 0.932 0.897 NMI 0.730 – – 0.736 0.755 0.782 0.688 表 5 UCI数据集聚类结果评价指标对比(2)
FCM PSO-IFCM GA-IFCM ABC-IFCM KFCM WGFCM FCM-SAS(标准化样本) FCM-SAS(未标准化样本) Iris AR 0.893 0.807 0.849 0.787 0.895 0.973 0.987 0.953 Wine AR 0.949 0.655 0.652 0.642 0.942 0.966 0.955 0.781 Glass AR 0.421 0.419 0.393 0.467 0.460 0.733 0.533 0.472 表 6 FCM-SAS算法聚类过程统计数据
样本集 1次定类错误样本数 1次定类正确的存疑样本数 1次定类错误的存疑样本数 存疑样本数 2次定类正确样本数 2次定类错误样本数 Iris 16 29 16 45 43 2 Seeds 21 44 19 63 48 15 Breast 43 173 27 200 191 9 Wine 9 45 8 53 47 6 Glass 126 21 42 63 45 18 -
BEZDEK J C. Pattern Recognition with Fuzzy Objective Function Algorithms[M]. Boston: Springer, 1981: 155–201. FRIGUI H and KRISHNAPURAM R. A robust competitive clustering algorithm with applications in computer vision[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(5): 450–465. doi: 10.1109/34.765656 YANG M and NATALIANI Y. Robust-learning fuzzy C-means clustering algorithm with unknown number of clusters[J]. Pattern Recognition, 2017, 71: 45–59. doi: 10.1016/j.patcog.2017.05.017 SON L H and TIEN N D. Tune up fuzzy C-means for big data: Some novel hybrid clustering algorithms based on initial selection and incremental clustering[J]. International Journal of Fuzzy Systems, 2017, 19(5): 1585–1602. doi: 10.1007/s40815-016-0260-3 HUANG Chengquan, CHUNG F, and WANG Shitong. Generalized competitive agglomeration clustering algorithm[J]. International Journal of Machine Learning and Cybernetics, 2017, 8(6): 1945–1969. doi: 10.1007/s13042-016-0572-5 SINGH C and BALA A. A DCT-based local and non-local fuzzy C-means algorithm for segmentation of brain magnetic resonance images[J]. Applied Soft Computing, 2018, 68: 447–457. doi: 10.1016/j.asoc.2018.03.054 SAHA A and DAS S. Geometric divergence based fuzzy clustering with strong resilience to noise features[J]. Pattern Recognition Letters, 2016, 79: 60–67. doi: 10.1016/j.patrec.2016.04.013 肖满生, 肖哲, 文志诚, 等. 一种空间相关性与隶属度平滑的FCM改进算法[J]. 电子与信息学报, 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710XIAO Mansheng, XIAO Zhe, WEN Zhicheng, et al. Improved fcm clustering algorithm based on spatial correlation and membership smoothing[J]. Journal of Electronics &Information Technology, 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710 WANG Xizhao, WANG Yadong, and WANG Lijuan. Improving fuzzy C-means clustering based on feature-weight learning[J]. Pattern Recognition Letters, 2004, 25(10): 1123–1132. doi: 10.1016/j.patrec.2004.03.008 YANG M S and NATALIANI Y. A feature-reduction fuzzy clustering algorithm based on feature-weighted entropy[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 817–835. doi: 10.1109/TFUZZ.2017.2692203 KUO R J, LIN T C, ZULVIA F E, et al. A hybrid metaheuristic and kernel intuitionistic fuzzy c-means algorithm for cluster analysis[J]. Applied Soft Computing, 2018, 67: 299–308. doi: 10.1016/j.asoc.2018.02.039 JIANG Zhaohui, LI Tingting, MIN Wengfang, et al. Fuzzy c-means clustering based on weights and gene expression programming[J]. Pattern Recognition Letters, 2017, 90: 1–7. doi: 10.1016/j.patrec.2017.02.015 JIE Lilin, LIU Weidong, SUN Zheng, et al. Hybrid fuzzy clustering methods based on improved self-adaptive cellular genetic algorithm and optimal-selection-based fuzzy c-means[J]. Neurocomputing, 2017, 249: 140–156. doi: 10.1016/j.neucom.2017.03.068 KIM E H, OH S K, and PEDRYCZ W. Reinforced hybrid interval fuzzy neural networks architecture: Design and analysis[J]. Neurocomputing, 2018, 303: 20–36. doi: 10.1016/j.neucom.2018.04.003 DAGHER I. Fuzzy clustering using multiple Gaussian kernels with optimized-parameters[J]. Fuzzy Optimization and Decision Making, 2018, 17(2): 159–176. doi: 10.1007/s10700-017-9268-x 王骏, 刘欢, 蒋亦樟, 等. 堆叠隐空间模糊C均值聚类算法[J]. 控制与决策, 2016, 31(9): 1671–1677. doi: 10.13195/j.kzyjc.2015.0768WANG Jun, LIU Huan, JIANG Yizhang, et al. Cascaded hidden space fuzzy C means clustering algorithm[J]. Control and Decision, 2016, 31(9): 1671–1677. doi: 10.13195/j.kzyjc.2015.0768 LUO Xiong, XU Yang, WANG Weiping, et al. Towards enhancing stacked extreme learning machine with sparse autoencoder by correntropy[J]. Journal of the Franklin Institute, 2018, 355(4): 1945–1966. doi: 10.1016/j.jfranklin.2017.08.014 DAI Jianhua, HU Qinghua, HU Hu, et al. Neighbor inconsistent pair selection for attribute reduction by rough set approach[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 937–950. doi: 10.1109/TFUZZ.2017.2698420 BU Fanyu. An efficient fuzzy c-means approach based on canonical polyadic decomposition for clustering big data in IoT[J]. Future Generation Computer Systems, 2018, 88: 675–682. doi: 10.1016/j.future.2018.04.045 MACIEJEWSKI R, JANG Y, WOO I, et al. Abstracting attribute space for transfer function exploration and design[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(1): 94–107. doi: 10.1109/TVCG.2012.105 李保珍, 张亭亭. 成对属性关联分析及其属性空间构建[J]. 情报学报, 2014, 33(11): 1194–1203.LI Baozhen and ZHANG Tingting. Association analysis of pairwise attributes and construction of attribute space[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(11): 1194–1203. WEI Cuiping, WANG Pei, and ZHANG Yuzhong. Entropy, similarity measure of interval-valued intuitionistic fuzzy sets and their applications[J]. Information Sciences, 2011, 181(19): 4273–4286. doi: 10.1016/j.ins.2011.06.001 BACHE K and LICHMAN M. UCI irvine machine learning repository[EB/OL]. Irvine, CA: University of California, School of Information and Computer Science, http://archive.ics.uci.edu/ml2013. RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the American Statistical Association, 1971, 66(336): 846–850. doi: 10.1080/01621459.1971.10482356 COOMBS C H, DAWES R M, and TVERSKY A. Mathematical Psychology: An Elementary Introduction[M]. Englewood Cliffs, USA: Prentice-Hall, 1970: 391–406. ZHANG Daoqiang and CHEN Songcan. Clustering incomplete data using kernel-based fuzzy C-means algorithm[J]. Neural Processing Letters, 2003, 18(3): 155–162. doi: 10.1023/B:NEPL.0000011135.19145.1b
计量
- 文章访问数: 3094
- HTML全文浏览量: 1210
- PDF下载量: 79
- 被引次数: 0