一种改进的区间型不确定数据模糊聚类方法

肖满生; 张龙信; 张晓丽; 胡永祥

doi:10.11999/JEIT190591

一种改进的区间型不确定数据模糊聚类方法

doi: 10.11999/JEIT190591 cstr: 32379.14.JEIT190591

湖南工业大学计算机学院株洲 412007

基金项目: 国家自然科学基金(61702178)，湖南省自然科学基金(2018JJ4068)，湖南省教育厅科研项目(18C0499)

详细信息

作者简介:
肖满生：男，1968年生，教授，主要研究方向为智能计算和智能信息处理

张龙信：男，1983年生，博士，讲师，研究方向为大数据与数据安全

张晓丽：女，1994年生，硕士，研究方向为智能信息处理

通讯作者:
肖满生　xiaomansheng@tom.com

中图分类号: TN911.7; TP391
计量
- 文章访问数: 2542
- HTML全文浏览量: 1386
- PDF下载量: 110
- 被引次数: 0
出版历程
- 收稿日期: 2019-08-06
- 修回日期: 2020-02-19
- 网络出版日期: 2020-03-14
- 刊出日期: 2020-08-18

An Improved Fuzzy Clustering Method for Interval Uncertain Data

School of Computer Science, Hunan University of Technology, Zhuzhou 412007, China

Funds: The National Natural Science Foundation of China (61702178), The Natural Science Foundation of Hunan Provierce (2018554068), The Research Project of Hunan Provincial Department of Education (18C0499)

摘要

摘要:
针对区间型不确定数据的特点，该文提出一种改进的模糊C均值聚类算法(IU-IFCM)。首先对区间型数据进行特征变换，由p维特征映射成由2p维特征组成的实数据，然后考虑区间中值与区间大小关系，设计一种样本距离计算方法，通过模糊C均值实现对区间型样本聚类。理论分析与对比实验表明，该算法的划分系数(PC)及正确等级(CR)值比其它方法平均提高10%以上，表明有更好的聚类精度，对当前大数据环境下不确定数据的分类提供了一种新的解决方案。
- 区间型数据 /
- 模糊C均值 /
- 影响因子 /
- 特征变换
Abstract:
An Improved Fuzzy C-Means clustering algorithm (IU-IFCM) is proposed in this study in accordance with the characteristics of Interval Uncertain data. First, the interval data is transformed into real data composed of 2p dimension feature, which is mapped from that of p dimension feature. Second, a method for calculating sample distance, which realizes the interval sample clustering by fuzzy c-mean algorithm, is designed while considering the relationship between interval median value and interval size. Theoretical analysis and comparison experiments show that the presented algorithm surpaes the compared algorithms by more than 10% on average in terms of the Partition Coefficient (PC) and Correct Rank(CR) value. These results indicate that the algorithm presents in this study has better clustering accuracy and provides a new solution for the classification of uncertain data in current big data environments.
- Interval data /
- Fuzzy C-means /
- Impact factor /
- Feature transformation

HTML全文

图 1 4种算法的划分系数比较

下载: 全尺寸图片幻灯片

图 2 Fish数据集4种算法的PC, CR比较

下载: 全尺寸图片幻灯片

图 3 人工合成区间数据集

下载: 全尺寸图片幻灯片

图 4 人工合成数据集4种算法的PC、CR比较

下载: 全尺寸图片幻灯片

表 1 Fat_Oil数据集

样本	比重(g/cm³)	冰点(°C)	io值	sa值
亚麻油	[0.930 0.935]	[–27 –8]	[170 204]	[118 196]
紫苏油	[0.930 0.937]	[–5 –4]	[192 208]	[188 197]
棉籽油	[0.916 0.918]	[–6 –1]	[99 113]	[189 198]
芝麻油	[0.920 0.926]	[–6 –4]	[104 116]	[187 193]
山茶油	[0.916 0.917]	[–21 –15]	[80 82]	[189 193]
橄榄油	[0.914 0.919]	[0 6]	[79 90]	[187 196]
牛油	[0.860 0.870]	[30 38]	[40 48]	[190 199]
猪油	[0.858 0.864]	[22 32]	[53 77]	[190 202]

下载: 导出CSV

表 2 4种算法对Fish数据集的分类结果

	腐屑性	肉食性	杂食性	草食性
先验分类	1 2 3 4	5 6 7 8	9 10	11 12
E_FCM	1 2 5	4 6 3	7 10	8 9 11 12
M_FCM	1 3 4	6 10 11	2 8	5 7 9 12
D_FCM	1 2 4	5 6 8 9	3 10 11	7 12
IU_IFCM	1 2 3 4	6 7 8	5 9 10	11 12

下载: 导出CSV

表 3 人工数据集

参数	类1	类2	类3
${m_1}$	28	60	45
${m_2}$	22	30	38
$\sigma _1^2$	100	9	9
$\sigma _2^2$	9	144	9

下载: 导出CSV

参考文献(18)

JIANG Bin, PEI Jian, TAO Yufei, et al. Clustering uncertain data based on probability distribution similarity[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(4): 751–763. doi: 10.1109/TKDE.2011.221

GULLO F and TAGARELLI A. Uncertain centroid based partitional clustering of uncertain data[J]. Proceedings of the VLDB Endowment, 2012, 5(7): 610–621. doi: 10.14778/2180912.2180914

DALLACHIESA M, JACQUES-SILVA G, GEDIK B, et al. Sliding windows over uncertain data streams[J]. Knowledge and Information Systems, 2015, 45(1): 159–190. doi: 10.1007/s10115-014-0804-5

彭宇, 罗清华, 彭喜元. UIDK-means: 多维不确定性测量数据聚类算法[J]. 仪器仪表学报, 2011, 32(6): 1201–1207. doi: 10.19650/j.cnki.cjsi.2011.06.001

PENG Yu, LUO Qinghua, and PENG Xiyuan. UIDK-means: A Multi-dimensional uncertain measurement data clustering algorithm[J]. Chinese Journal of Scientific Instrument, 2011, 32(6): 1201–1207. doi: 10.19650/j.cnki.cjsi.2011.06.001

BAO Chaozheng, PENG Hongming, HE Di, et al. Adaptive fuzzy c-means clustering algorithm for interval data type based on interval-dividing technique[J]. Pattern Analysis and Applications, 2018, 21(3): 803–812. doi: 10.1007/s10044-017-0663-2

D’URSO P, MASSARI R, DE GIOVANNI L, et al. Exponential distance-based fuzzy clustering for interval-valued data[J]. Fuzzy Optimization and Decision Making, 2017, 16(1): 51–70. doi: 10.1007/s10700-016-9238-8

BRITO P, SILVA A P D, and DIAS J G. Probabilistic clustering of interval data[J]. Intelligent Data Analysis, 2015, 19(2): 293–313. doi: 10.3233/IDA-150718

HAMDAN H. Maximum likelihood estimation from interval-valued data. Application to fuzzy clustering[C]. The 13th International Conference on Theory and Application of Fuzzy Systems and Soft Computing -ICAFS-2018. Istanbul, Turkey, 2019: 3–10. doi: 10.1007/978-3-030-04164-9_3.

谢志伟, 王志明. 一种区间型数据的自适应模糊C均值聚类算法[J]. 计算机工程与应用, 2012, 48(17): 193–198, 237. doi: 10.3778/j.issn.1002-8331.2012.17.038

XIE Zhiwei and WANG Zhiming. Self-adapting fuzzy c means clustering algorithm for interval data[J]. Computer Engineering and Applications, 2012, 48(17): 193–198, 237. doi: 10.3778/j.issn.1002-8331.2012.17.038

GAO Xinbo, JI Hongbing, and XIE Weixin. A novel FCM clustering algorithm for interval-valued data and fuzzy-valued data[C]. The 5th International Conference on Signal Processing Proceedings. The 16th World Computer Congress 2000, Beijing, China, 2000: 1551–1555. doi: 10.1109/ICOSP.2000.893395.

MACIEL L, BALLINI R, GOMIDE F, et al. Participatory learning fuzzy clustering for interval-valued data[C]. The 16th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Eindhoven, The Netherlands, 2016: 687–698. doi: 10.1007/978-3-319-40596-4_57.

兰蓉. 模糊信息距离及其若干应用[D]. [博士论文], 西安电子科技大学, 2013: 61–73.

LAN Rong. Fuzzy information distances and their some applications[D]. [Ph.D. dissertation], Xidian University, 2013: 61–73.

金萍, 宗瑜, 屈世超, 等. 面向不确定数据的近似骨架启发式聚类算法[J]. 南京大学学报: 自然科学, 2015, 51(1): 197–205. doi: 10.13232/j.cnki.jnju.2015.01.027

JIN Ping, ZONG Yu, QU Shichao, et al. Approximate backbone guided heuristic clustering algorithm for uncertain data[J]. Journal of Nanjing University:Natural Sciences, 2015, 51(1): 197–205. doi: 10.13232/j.cnki.jnju.2015.01.027

魏方圆, 黄德才. 基于区间数的多维不确定性数据UID-DBSCAN聚类算法[J]. 计算机科学, 2017, 44(11A): 442–447. doi: 10.11896/j.issn.1002-137X.2017.11A.094

WEI Fangyuan and HUANG Decai. UID-DBSCAN clustering algorithm of multi-dimensional uncertain data based on interval number[J]. Computer Science, 2017, 44(11A): 442–447. doi: 10.11896/j.issn.1002-137X.2017.11A.094

ZHANG Qin, FANG Zhigeng, LIU Sifeng, et al. On variable weight clustering model of generalized interval grey numbers for multiple uncertain data[J]. Journal of Grey System, 2019, 31(1): 84–99.

陆亿红, 任胜亮. 基于区间数的不确定数据流2κ近邻聚类算法[J]. 浙江工业大学学报, 2018, 46(3): 321–326. doi: 10.3969/j.issn.1006-4303.2018.03.015

LU Yihong and REN Shengliang. The clustering algorithm of uncertain data stream 2κ-near neighbors based on interval number[J]. Journal of Zhejiang University of Technology, 2018, 46(3): 321–326. doi: 10.3969/j.issn.1006-4303.2018.03.015

张新猛, 蒋盛益. 一种基于相似度概率的不确定分类数据聚类算法[J]. 山东大学学报: 工学版, 2011, 41(3): 12–16.

ZHANG Xinmeng and JIANG Shengyi. An algorithm for clustering uncertain categorical data based on similarity probability[J]. Journal of Shandong University:Engineering Science, 2011, 41(3): 12–16.

TRAN L and DUCKSTEIN L. Comparison of fuzzy numbers using a fuzzy distance measure[J]. Fuzzy Sets and Systems, 2002, 130(3): 331–341. doi: 10.1016/s0165-0114(01)00195-6

施引文献

资源附件(0)

访问统计