一种基于属性空间相似性的模糊聚类算法

施伟锋; 卓金宝; 兰莹

doi:10.11999/JEIT180974

一种基于属性空间相似性的模糊聚类算法

doi: 10.11999/JEIT180974

上海海事大学上海 201306

基金项目: 国家自然科学基金(61503240)，上海海事大学研究生创新基金(2016ycx078)

详细信息

作者简介:
施伟锋：男，1963年生，博士，教授，主要研究方向为电力系统自动化

卓金宝：男，1991年生，博士生，研究方向为智能故障诊断与预测

兰莹：女，1985年生，博士，讲师，研究方向为多自主体与混杂系统研究

通讯作者:
施伟锋　wfshi@shmtu.edu.cn

中图分类号: TP391
计量
- 文章访问数: 3102
- HTML全文浏览量: 1215
- PDF下载量: 79
- 被引次数: 0
出版历程
- 收稿日期: 2018-10-17
- 修回日期: 2019-02-28
- 网络出版日期: 2019-04-25
- 刊出日期: 2019-11-01

A Novel Fuzzy Clustering Algorithm Based on Similarity of Attribute Space

Shanghai Maritime University, Shanghai 201306, China

Funds: The National Natural Science Foundation of China (61503240), Shanghai Maritime University Graduate Student Innovation Fund Project (2016ycx078)

摘要

摘要: 模糊C均值(FCM)聚类算法及其相关改进算法基于最大模糊隶属度原则确定聚类结果，没有充分利用迭代后的模糊隶属度矩阵和簇类中心的样本属性特征信息，影响聚类准确度。针对这个问题，该文提出一种新的改进思路：改进FCM算法输出定类原则。给出二元属性拓扑子空间中属性相似度的定义，最终提出一种基于属性空间相似性的改进FCM算法(FCM-SAS)：首先，选择FCM算法聚类后模糊隶属度低于聚类置信度的样本作为存疑样本；然后，计算存疑样本与聚类后聚类中心的属性相似度；最后，基于最大属性相似度原则更新存疑样本的簇类标签。通过UCI数据集实验，证明算法不仅有效，还较一些基于最大模糊隶属度原则定类的改进算法具有更优的聚类评价指标。
- 模糊C均值聚类 /
- 属性拓扑子空间 /
- 拓扑相似度 /
- 聚类置信度 /
- 最大属性相似度原则
Abstract: With the attribute feature information of the fuzzy membership matrix and cluster centers after the iteration not fully utilized, the results of Fuzzy C-Means (FCM) Clustering and related modified algorithms are determined based on the principle of maximum fuzzy membership, causing bad influence on the clustering accuracy. To solve this problem, the improvement ideas are proposed: to improve classification principle of FCM. The formula definition of attribute similarity in binary topological subspaces is given. Then, the improved FCM algorithm based on the Similarity of Attribute Space (FCM-SAS) is proposed: First, samples with fuzzy membership degree lower than the clustering reliability are selected as suspicious samples. Next, the attribute similarity between the suspicious samples and the cluster centers after clustering are calculated. Finally, cluster labels of suspicious samples based on the principle of maximum attribute similarity are updated. The validity and superiority of the proposed algorithm is verified by the UCI sample set experiments and comparisons with other modified algorithms based on the principle of maximum fuzzy membership.
- Fuzzy C-Means (FCM) clustering /
- Attribute topology subspace /
- Attribute similarity /
- Clustering reliability /
- Principle of maximum attribute similarity

HTML全文

表 1 FCM-SAS算法具体步骤

输入：	样本集${\text{X}}$、样本数$n$，聚类个数$c$、加权指数$m$、迭代阈值$\varepsilon $、最大迭代次数T、聚类存疑率$\xi $、属性占比率$\kappa $；
输出：	样本标签集${{\text{X}}_l}'$；
1	按表1的传统FCM算法步骤得到迭代后模糊隶属度矩阵${\text{U}}$、簇类中心${\text{V}}$和样本标签集${{\text{X}}_l}$，令$j = 0$；
2	计算所有样本点${\text{x}}$的模糊隶属度最大值，按递增顺序排序并组成数组，选出此数组中第$\left[ {\xi \times n} \right]$个元素的数值作为聚类置信度$\eta $；
3	令$j = j + 1$，判断第$j$个样本的模糊隶属度$\max \left( {\left\{ {{u_{ij}}\left\| {i = 1, 2, ·\!·\!· , c} \right.} \right\}} \right)$是否不大于聚类置信度$\eta $，若是则此样本为存疑样本，转步骤4；否则转步骤8；
4	按式(8)计算第$j$个样本与各个簇类中心在2元属性拓扑子空间中的拓扑相似度集合$\gamma \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)$，将所有集合中的元素取绝对值后按递增的顺序排序并组成数组，计算此数组中第$\left[ {n \times {\gamma _{dis}} \times \kappa } \right]$个元素与数值1之间差的绝对值作为邻域半径$\delta $；
5	以$\delta $为邻域半径，按式(10)计算第$j$个样本与各个簇类中心的属性相似度$\psi \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)$；
6	若最大属性相似度$\max \left( {\psi \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)} \right)$只有一个，则选出最大属性相似度时的簇类中心所在的类别作为此样本更新后的标签$x_{lj}'$，转步骤8；否则，转步骤7；
7	若最大属性相似度不止一个，则选择这些簇类中最大拓扑相似度集合之和$\widehat S = \max \left( {\Sigma \gamma \left( {{{\text{x}}_j}, {{\text{v}}_i}} \right)} \right)$时的簇类中心所在的类别作为$x_{lj}'$；
8	判断$j < n$，若是则转步骤3，否则输出更新后的样本标签集${{\text{X}}_l}'$。

下载: 导出CSV

表 2 算法输入参数设置

参数	数值
加权指数$m$	2
迭代阈值$\varepsilon $	${10^{ - 3}}$
最大迭代次数$T\;$	$100$
聚类存疑率$\xi $	$0.3$
属性占比率$\kappa $	$0.5$

下载: 导出CSV

表 3 UCI数据集的统计描述

数据集	样本数	维数	簇类	各类占比
Iris	150	4	3	50:50:50
Wine	178	13	3	59:71:48
Seeds	210	7	3	70:70:70
Breast	683	9	2	444:239
Glass	214	9	6	70:17:76:13:9:29

下载: 导出CSV

表 4 UCI数据集聚类结果评价指标对比(1)

		FCM	RL-FCM	RCA	WFCM	FRCM	FCM-SAS(标准化样本)	FCM-SAS(未标准化样本)
Iris	AR	0.893	0.907	0.967	0.957	0.960	0.987	0.953
	RI	0.880	0.892	0.958	0.934	0.952	0.983	0.942
	NMI	0.750	–	–	0.831	0.873	0.949	0.8498
Seeds	AR	0.895	0.895	0.903	0.895	0.895	0.919	0.900
	RI	0.874	0.874	0.884	0.873	0.876	0.899	0.877
	NMI	0.695	–	–	0.677	0.697	0.717	0.671
Breast	AR	0.937	0.953	0.655	0.938	0.947	0.965	0.946
	RI	0.876	0.910	0.548	0.884	0.911	0.932	0.897
	NMI	0.730	–	–	0.736	0.755	0.782	0.688

下载: 导出CSV

表 5 UCI数据集聚类结果评价指标对比(2)

		FCM	PSO-IFCM	GA-IFCM	ABC-IFCM	KFCM	WGFCM	FCM-SAS(标准化样本)	FCM-SAS(未标准化样本)
Iris	AR	0.893	0.807	0.849	0.787	0.895	0.973	0.987	0.953
Wine	AR	0.949	0.655	0.652	0.642	0.942	0.966	0.955	0.781
Glass	AR	0.421	0.419	0.393	0.467	0.460	0.733	0.533	0.472

下载: 导出CSV

表 6 FCM-SAS算法聚类过程统计数据

样本集	1次定类错误样本数	1次定类正确的存疑样本数	1次定类错误的存疑样本数	存疑样本数	2次定类正确样本数	2次定类错误样本数
Iris	16	29	16	45	43	2
Seeds	21	44	19	63	48	15
Breast	43	173	27	200	191	9
Wine	9	45	8	53	47	6
Glass	126	21	42	63	45	18

下载: 导出CSV

参考文献(26)

BEZDEK J C. Pattern Recognition with Fuzzy Objective Function Algorithms[M]. Boston: Springer, 1981: 155–201.

FRIGUI H and KRISHNAPURAM R. A robust competitive clustering algorithm with applications in computer vision[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21(5): 450–465. doi: 10.1109/34.765656

YANG M and NATALIANI Y. Robust-learning fuzzy C-means clustering algorithm with unknown number of clusters[J]. Pattern Recognition, 2017, 71: 45–59. doi: 10.1016/j.patcog.2017.05.017

SON L H and TIEN N D. Tune up fuzzy C-means for big data: Some novel hybrid clustering algorithms based on initial selection and incremental clustering[J]. International Journal of Fuzzy Systems, 2017, 19(5): 1585–1602. doi: 10.1007/s40815-016-0260-3

HUANG Chengquan, CHUNG F, and WANG Shitong. Generalized competitive agglomeration clustering algorithm[J]. International Journal of Machine Learning and Cybernetics, 2017, 8(6): 1945–1969. doi: 10.1007/s13042-016-0572-5

SINGH C and BALA A. A DCT-based local and non-local fuzzy C-means algorithm for segmentation of brain magnetic resonance images[J]. Applied Soft Computing, 2018, 68: 447–457. doi: 10.1016/j.asoc.2018.03.054

SAHA A and DAS S. Geometric divergence based fuzzy clustering with strong resilience to noise features[J]. Pattern Recognition Letters, 2016, 79: 60–67. doi: 10.1016/j.patrec.2016.04.013

肖满生, 肖哲, 文志诚, 等. 一种空间相关性与隶属度平滑的FCM改进算法[J]. 电子与信息学报, 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710

XIAO Mansheng, XIAO Zhe, WEN Zhicheng, et al. Improved fcm clustering algorithm based on spatial correlation and membership smoothing[J]. Journal of Electronics &Information Technology, 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710

WANG Xizhao, WANG Yadong, and WANG Lijuan. Improving fuzzy C-means clustering based on feature-weight learning[J]. Pattern Recognition Letters, 2004, 25(10): 1123–1132. doi: 10.1016/j.patrec.2004.03.008

YANG M S and NATALIANI Y. A feature-reduction fuzzy clustering algorithm based on feature-weighted entropy[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 817–835. doi: 10.1109/TFUZZ.2017.2692203

KUO R J, LIN T C, ZULVIA F E, et al. A hybrid metaheuristic and kernel intuitionistic fuzzy c-means algorithm for cluster analysis[J]. Applied Soft Computing, 2018, 67: 299–308. doi: 10.1016/j.asoc.2018.02.039

JIANG Zhaohui, LI Tingting, MIN Wengfang, et al. Fuzzy c-means clustering based on weights and gene expression programming[J]. Pattern Recognition Letters, 2017, 90: 1–7. doi: 10.1016/j.patrec.2017.02.015

JIE Lilin, LIU Weidong, SUN Zheng, et al. Hybrid fuzzy clustering methods based on improved self-adaptive cellular genetic algorithm and optimal-selection-based fuzzy c-means[J]. Neurocomputing, 2017, 249: 140–156. doi: 10.1016/j.neucom.2017.03.068

KIM E H, OH S K, and PEDRYCZ W. Reinforced hybrid interval fuzzy neural networks architecture: Design and analysis[J]. Neurocomputing, 2018, 303: 20–36. doi: 10.1016/j.neucom.2018.04.003

DAGHER I. Fuzzy clustering using multiple Gaussian kernels with optimized-parameters[J]. Fuzzy Optimization and Decision Making, 2018, 17(2): 159–176. doi: 10.1007/s10700-017-9268-x

王骏, 刘欢, 蒋亦樟, 等. 堆叠隐空间模糊C均值聚类算法[J]. 控制与决策, 2016, 31(9): 1671–1677. doi: 10.13195/j.kzyjc.2015.0768

WANG Jun, LIU Huan, JIANG Yizhang, et al. Cascaded hidden space fuzzy C means clustering algorithm[J]. Control and Decision, 2016, 31(9): 1671–1677. doi: 10.13195/j.kzyjc.2015.0768

LUO Xiong, XU Yang, WANG Weiping, et al. Towards enhancing stacked extreme learning machine with sparse autoencoder by correntropy[J]. Journal of the Franklin Institute, 2018, 355(4): 1945–1966. doi: 10.1016/j.jfranklin.2017.08.014

DAI Jianhua, HU Qinghua, HU Hu, et al. Neighbor inconsistent pair selection for attribute reduction by rough set approach[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(2): 937–950. doi: 10.1109/TFUZZ.2017.2698420

BU Fanyu. An efficient fuzzy c-means approach based on canonical polyadic decomposition for clustering big data in IoT[J]. Future Generation Computer Systems, 2018, 88: 675–682. doi: 10.1016/j.future.2018.04.045

MACIEJEWSKI R, JANG Y, WOO I, et al. Abstracting attribute space for transfer function exploration and design[J]. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(1): 94–107. doi: 10.1109/TVCG.2012.105

李保珍, 张亭亭. 成对属性关联分析及其属性空间构建[J]. 情报学报, 2014, 33(11): 1194–1203.

LI Baozhen and ZHANG Tingting. Association analysis of pairwise attributes and construction of attribute space[J]. Journal of the China Society for Scientific and Technical Information, 2014, 33(11): 1194–1203.

WEI Cuiping, WANG Pei, and ZHANG Yuzhong. Entropy, similarity measure of interval-valued intuitionistic fuzzy sets and their applications[J]. Information Sciences, 2011, 181(19): 4273–4286. doi: 10.1016/j.ins.2011.06.001

BACHE K and LICHMAN M. UCI irvine machine learning repository[EB/OL]. Irvine, CA: University of California, School of Information and Computer Science, http://archive.ics.uci.edu/ml2013.

RAND W M. Objective criteria for the evaluation of clustering methods[J]. Journal of the American Statistical Association, 1971, 66(336): 846–850. doi: 10.1080/01621459.1971.10482356

COOMBS C H, DAWES R M, and TVERSKY A. Mathematical Psychology: An Elementary Introduction[M]. Englewood Cliffs, USA: Prentice-Hall, 1970: 391–406.

ZHANG Daoqiang and CHEN Songcan. Clustering incomplete data using kernel-based fuzzy C-means algorithm[J]. Neural Processing Letters, 2003, 18(3): 155–162. doi: 10.1023/B:NEPL.0000011135.19145.1b

施引文献

资源附件(0)

访问统计