簇间可分的鲁棒模糊C均值聚类算法

高云龙; 杨程宇; 王志豪; 罗斯哲; 潘金艳

doi:10.11999/JEIT180604

簇间可分的鲁棒模糊C均值聚类算法

doi: 10.11999/JEIT180604 cstr: 32379.14.JEIT180604

1.
厦门大学航空航天学院厦门 361102
2.
集美大学信息工程学院厦门 361021

基金项目: 国家自然科学基金(61203176)，福建省自然科学基金(2013J05098, 2016J01756)

详细信息

作者简介:
高云龙：男，1979年生，副教授，研究方向为机器学习、时间序列分析和生产制造系统优化与调度

杨程宇：男，1996年生，本科生，研究方向为机器学习

王志豪：男，1993年生，硕士生，研究方向为模式识别和机器学习

罗斯哲：男，1995年生，硕士生，研究方向为维数约简、模式识别和机器学习

潘金艳：女，1978年生，副教授，研究方向为人工智能和机器学习理论与方法

通讯作者:
潘金艳　gaoyl@xmu.edu.cn

中图分类号: TP311.13
计量
- 文章访问数: 2663
- HTML全文浏览量: 1156
- PDF下载量: 97
- 被引次数: 0
出版历程
- 收稿日期: 2018-06-20
- 修回日期: 2018-12-24
- 网络出版日期: 2018-12-28
- 刊出日期: 2019-05-01

Robust Fuzzy C-means Clustering Algorithm Integrating Between-cluster Information

1.
School of Aerospace Engineering, Xiamen University, Xiamen 361102, China
2.
Information Engineering College, Jimei University, Xiamen 361021, China

Funds: The National Natural Science Foundation of China (61203176), The Natural Science Foundation of Fujian Province (2013J05098, 2016J01756)

摘要

摘要:
与经典的K均值聚类算法相比，模糊C均值(FCM)聚类算法通过引入模糊因子，考虑不同聚类数据簇之间的相互关系，得到可分性更好的聚类结果。但是模糊因子的引入，使得任意一个样本点都存在模糊性，造成FCM极易受到噪声和离群点的影响，聚类结果泛化性能较差。因此，该文提出一种簇间可分的鲁棒FCM算法(RBI-FCM)。RBI-FCM利用K均值算法对模糊隶属度的稀疏特征，降低不同数据簇之间的相互作用，突出不同数据簇相邻区域的可分性；另外，RBI-FCM在极小化数据簇内部散布度的条件下，考虑不同数据簇之间的可分性，可提高聚类模型的泛化性能。该文设计了有效的模型求解迭代算法。实验结果表明，RBI-FCM算法提高了FCM的鲁棒性，有效降低FCM对数据簇分布差异性和抽样不均衡的敏感性，得到理想的聚类结果。
- 聚类 /
- 模糊C均值 /
- 样本分布 /
- 簇间信息
Abstract:
Comparing with K-means, Fuzzy logic is introduced in Fuzzy C-Means to handle the information between clusters. It can obtain better cluster results. However, fuzzy logic makes observations could belong to more than just one cluster, which results FCM is especially sensitivity to the noisy and outlier and has poor generalization performance. So a Rrobust Fuzzy C-Means clustering integrated Between-cluster Information algorithm (RBI-FCM) is proposed. Taking advantage of the sparsity of K-means, RBI-FCM helps to reduce the interactions among different clusters and improve the separability of sample points which locate in the adjacent domains of different clusters. Beside minimizing the inner-cluster scattering condition, RBI-FCM considers the between-cluster information. The generalization performance of RBI-FCM can be improved. An effective iterative algorithm for solving the model is designed in this paper. The experimental results show that the RBI-FCM improves the robustness of FCM and reduce effectively its sensitivity to size-imbalance and differences on the distribution of clusters of FCM. The great clustering result is obtained.
- Clustering /
- Fuzzy C-Means (FCM) /
- Sample distribution /
- Between-cluster information

HTML全文

图 1 聚类结果最大隶属度值曲线分布情况

下载: 全尺寸图片幻灯片

图 2 人造样本疏密分布数据集

下载: 全尺寸图片幻灯片

图 3 聚类结果正确率曲线

下载: 全尺寸图片幻灯片

图 4 人造样本容量分布不均数据集

下载: 全尺寸图片幻灯片

图 5 聚类结果正确率曲线

下载: 全尺寸图片幻灯片

图 6 人造非球形样本数据集及聚类结果

下载: 全尺寸图片幻灯片

表 1 实验1：人造样本数据集主要参数

样本集	类中心	协方差矩阵	各类样本数
1	(5, 5), (15, 15)	[1 0; 0 1], [1 0; 0 1]	50, 50
2	(5, 5), (15, 15)	[1 0; 0 1], [2 0; 0 2]	50, 50
$\vdots $	$\vdots $	$\vdots $	$\vdots $
10	(5, 5), (15, 15)	[1 0; 0 1], [10 0; 0 10]	50, 50

下载: 导出CSV

表 2 实验2：人造样本数据集主要参数

样本集	样本随机分布的圆心	各类样本数
1	(5, 5), (15, 15)	50, 50
2	(5, 5), (15, 15)	50, 51
$\vdots $	$\vdots $　　$\vdots $	$\vdots $
151	(5, 5), (15, 15)	50, 200

下载: 导出CSV

表 3 UCI数据集聚类实验的NMI正确率和RI正确率

UCI数据集	FCM	PFCM	GIFP-FCM	RBI-FCM	UCI数据集	FCM	PFCM	GIFP-FCM	RBI-FCM
Auto-mgp	0.5190	0.5167	0.5008	0.5443	Wine	0.4169	0.4168	0.3946	0.4911
Auto-mgp	0.7534	0.7537	0.7505	0.7895	Wine	0.7104	0.7105	0.6700	0.7287
Zoo	0.6760	0.6824	0.6284	0.6873	Balance Scale	0.1223	0.1232	0.1293	0.1326
Zoo	0.8381	0.8400	0.8236	0.8464	Balance Scale	0.5887	0.5900	0.5806	0.5947
Parkinsons	0.0926	0.0936	0.0526	0.1071	House Votes	0.4743	0.4743	0.2917	0.4948
Parkinsons	0.5934	0.5934	0.5693	0.6266	House Votes	0.7752	0.7752	0.6688	0.7890
Credit Approval	0.0304	0.0304	0.0365	0.1020	Vowel	0.3019	0.3127	0.3357	0.3737
Credit Approval	0.5048	0.5048	0.5207	0.5448	Vowel	0.7755	0.7988	0.8275	0.8153
Banknote Authentication	0.0292	0.0292	0.1145	0.5249	Mammographic Masses	0.1054	0.1065	0.1020	0.1130
Banknote Authentication	0.5236	0.5236	0.5555	0.8053	Mammographic Masses	0.5676	0.5683	0.5524	0.5746
注：每个数据集实验结果的第1行为NMI正确率，第2行为RI正确率

下载: 导出CSV

参考文献(18)

陈新泉, 周灵晶, 刘耀中. 聚类算法研究综述[J]. 集成技术, 2017, 6(3): 41–49. doi: 10.3969/j.issn.2095-3135.2017.03.004

CHEN Xinquan, ZHOU Lingjing, and LIU Yaozhong. Review on clustering algorithms[J]. Journal of Integrati on Technology, 2017, 6(3): 41–49. doi: 10.3969/j.issn.2095-3135.2017.03.004

张传锦, 李璐璐. 基于模糊C均值聚类的无线传感器网络节点定位算法[J]. 电子设计工程, 2016, 24(8): 58–60. doi: 10.14022/j.cnki.dzsjgc.2016.08.017

ZHANG Chuanjin and LI Lulu. Improving multilateration algorithm based on fuzzy C-means cluster in WSN[J]. Electronic Design Engineering, 2016, 24(8): 58–60. doi: 10.14022/j.cnki.dzsjgc.2016.08.017

池桂英, 王忠华. 基于分层的直觉模糊C均值聚类图像分割算法[J]. 计算机工程与设计, 2017(12): 3368–3373. doi: 10.16208/j.issn1000-7024.2017.12.031

CHI Guiying and WANG Zhonghua. Intuitionistic fuzzy C-means clustering algorithm based on hierarchy for image segmentation[J]. Computer Engineering and Design, 2017(12): 3368–3373. doi: 10.16208/j.issn1000-7024.2017.12.031

黄艳国, 罗云鹏. 基于改进模糊C均值聚类算法的城市道路状态判别方法[J]. 科学技术与工程, 2018, 18(9): 335–342. doi: 10.3969/j.issn.1671-1815.2018.09.052

HUANG Yanguo and LUO Yunpeng. Identification method of urban road condition based on improved fuzzy C-means method clustering algorithm[J]. Science Technology and Engineering, 2018, 18(9): 335–342. doi: 10.3969/j.issn.1671-1815.2018.09.052

赵泉华, 刘晓燕, 赵雪梅, 等. 基于可变类FCM算法的多光谱遥感影像分割[J]. 电子与信息学报, 2018, 40(1): 157–165. doi: 10.11999/JEIT170397

ZHAO Quanhua, LIU Xiaoyan, ZHAO Xuemei, et al. Multispectral remote sensing image segmentation based on FCM algorithm with unknown number of clusters[J]. Journal of Electronics &Information Technology, 2018, 40(1): 157–165. doi: 10.11999/JEIT170397

XU Rui and WUNSCH D. Survey of clustering algorithms[J]. IEEE Transactions on Neural Networks, 2005, 16(3): 645–678. doi: 10.1109/tnn.2005.845141

陈海鹏, 申铉京, 龙建武, 等. 自动确定聚类个数的模糊聚类算法[J]. 电子学报, 2017, 45(3): 687–694. doi: 10.3969/j.issn.0372-2112.2017.03.028

CHEN Haipeng, SHEN Xuanjing, LONG Jianwu, et al. Fuzzy clustering algorithm for automatic identification of clusters[J]. Acta Electronica Sinica, 2017, 45(3): 687–694. doi: 10.3969/j.issn.0372-2112.2017.03.028

YANG MiinShen and NATALIANI Y. Robust-learning fuzzy c-means clustering algorithm with unknown number of clusters[J]. Pattern Recognition, 2017, 71: 45–59. doi: 10.1109/nafips.2010.5548175

PAL N R, PAL K, KELLER J M, et al. A possibilistic fuzzy C-means clustering algorithm[J]. IEEE Transactions on Fuzzy Systems, 2005, 13(4): 517–530. doi: 10.1109/tfuzz.2004.840099

肖满生, 肖哲, 文志诚, 等. 一种空间相关性与隶属度平滑的FCM改进算法[J]. 电子与信息学报, 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710

XIAO Mansheng, XIAO Zhe, WEN Zhicheng, et al. Improved FCM clustering algorithm based on spatial correlation and membership smoothing[J]. Journal of Electronics &Information Technology, 2017, 39(5): 1123–1129. doi: 10.11999/JEIT160710

LIU Yun, HOU Tao, and LIU Fu. Improving fuzzy c-means method for unbalanced dataset[J]. Electronics Letters, 2015, 51(23): 1880–1882. doi: 10.1049/el.2015.1541

史慧峰, 马晓宁. 一种自适应的模糊C均值聚类算法[J]. 无线通信技术, 2016, 25(3): 40–45. doi: 10.3969/j.issn.1003-8329.2016.03.009

SHI Huifeng and MA Xiaoning. An adaptive fuzzy C-means clustering algorithm[J]. Wireless Communication Technology, 2016, 25(3): 40–45. doi: 10.3969/j.issn.1003-8329.2016.03.009

曲福恒. 模糊聚类算法及应用[M]. 北京: 国防工业出版社, 2011.

QU Fuheng. Fuzzy clustering algorithm and its application[M]. Beijing, National Defense Industry Press, 2011.

DUNN J C. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters[J]. Journal of Cybernetics, 1974, 3(3): 32–57. doi: 10.1080/01969727308546046

BEZDEK J C. Pattern Recognition with Fuzzy Objective Function Algorithms[J]. Springer US, 1981. doi: 10.1007/978-1-4757-0450-1

ZHU Lin, CHUNG FuLai, and WANG Shitong. Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions[J]. IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics A, 2009, 39(3): 578–591. doi: 10.3724/sp.j.1087.2013.02355

HÖPPNER F and KLAWONN F. Improved fuzzy partitions for fuzzy regression models[J]. International Journal of Approximate Reasoning, 2003, 32(2): 85–102. doi: 10.1016/s0888-613x(02)00078-6

DENG Zhaohong, CHOI K S, CHUNG Fulai, et al. Enhanced soft subspace clustering integrating within-cluster and between-cluster information[J]. Pattern Recognition, 2010, 43(3): 767–781. doi: 10.1016/j.patcog.2009.09.010

施引文献

资源附件(0)

访问统计