高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

一种改进的区间型不确定数据模糊聚类方法

肖满生 张龙信 张晓丽 胡永祥

肖满生, 张龙信, 张晓丽, 胡永祥. 一种改进的区间型不确定数据模糊聚类方法[J]. 电子与信息学报, 2020, 42(8): 1968-1974. doi: 10.11999/JEIT190591
引用本文: 肖满生, 张龙信, 张晓丽, 胡永祥. 一种改进的区间型不确定数据模糊聚类方法[J]. 电子与信息学报, 2020, 42(8): 1968-1974. doi: 10.11999/JEIT190591
Mansheng XIAO, Longxin ZHANG, Xiaoli ZHANG, Yongxiang HU. An Improved Fuzzy Clustering Method for Interval Uncertain Data[J]. Journal of Electronics & Information Technology, 2020, 42(8): 1968-1974. doi: 10.11999/JEIT190591
Citation: Mansheng XIAO, Longxin ZHANG, Xiaoli ZHANG, Yongxiang HU. An Improved Fuzzy Clustering Method for Interval Uncertain Data[J]. Journal of Electronics & Information Technology, 2020, 42(8): 1968-1974. doi: 10.11999/JEIT190591

一种改进的区间型不确定数据模糊聚类方法

doi: 10.11999/JEIT190591
基金项目: 国家自然科学基金(61702178),湖南省自然科学基金(2018JJ4068),湖南省教育厅科研项目(18C0499)
详细信息
    作者简介:

    肖满生:男,1968年生,教授,主要研究方向为智能计算和智能信息处理

    张龙信:男,1983年生,博士,讲师,研究方向为大数据与数据安全

    张晓丽:女,1994年生,硕士,研究方向为智能信息处理

    通讯作者:

    肖满生 xiaomansheng@tom.com

  • 中图分类号: TN911.7; TP391

An Improved Fuzzy Clustering Method for Interval Uncertain Data

Funds: The National Natural Science Foundation of China (61702178), The Natural Science Foundation of Hunan Provierce (2018554068), The Research Project of Hunan Provincial Department of Education (18C0499)
  • 摘要:

    针对区间型不确定数据的特点,该文提出一种改进的模糊C均值聚类算法(IU-IFCM)。首先对区间型数据进行特征变换,由p维特征映射成由2p维特征组成的实数据,然后考虑区间中值与区间大小关系,设计一种样本距离计算方法,通过模糊C均值实现对区间型样本聚类。理论分析与对比实验表明,该算法的划分系数(PC)及正确等级(CR)值比其它方法平均提高10%以上,表明有更好的聚类精度,对当前大数据环境下不确定数据的分类提供了一种新的解决方案。

  • 图  1  4种算法的划分系数比较

    图  2  Fish数据集4种算法的PC, CR比较

    图  3  人工合成区间数据集

    图  4  人工合成数据集4种算法的PC、CR比较

    表  1  Fat_Oil数据集

    样本比重(g/cm3)冰点(°C)io值sa值
    亚麻油[0.930 0.935][–27 –8][170 204][118 196]
    紫苏油[0.930 0.937][–5 –4][192 208][188 197]
    棉籽油[0.916 0.918][–6 –1][99 113][189 198]
    芝麻油[0.920 0.926][–6 –4][104 116][187 193]
    山茶油[0.916 0.917][–21 –15][80 82][189 193]
    橄榄油[0.914 0.919][0 6][79 90][187 196]
    牛油[0.860 0.870][30 38][40 48][190 199]
    猪油[0.858 0.864][22 32][53 77][190 202]
    下载: 导出CSV

    表  2  4种算法对Fish数据集的分类结果

    腐屑性肉食性杂食性草食性
    先验分类1 2 3 45 6 7 89 1011 12
    E_FCM1 2 54 6 37 108 9 11 12
    M_FCM1 3 46 10 112 85 7 9 12
    D_FCM1 2 45 6 8 93 10 117 12
    IU_IFCM1 2 3 46 7 85 9 1011 12
    下载: 导出CSV

    表  3  人工数据集

    参数类1类2类3
    ${m_1}$286045
    ${m_2}$223038
    $\sigma _1^2$10099
    $\sigma _2^2$91449
    下载: 导出CSV
  • JIANG Bin, PEI Jian, TAO Yufei, et al. Clustering uncertain data based on probability distribution similarity[J]. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(4): 751–763. doi: 10.1109/TKDE.2011.221
    GULLO F and TAGARELLI A. Uncertain centroid based partitional clustering of uncertain data[J]. Proceedings of the VLDB Endowment, 2012, 5(7): 610–621. doi: 10.14778/2180912.2180914
    DALLACHIESA M, JACQUES-SILVA G, GEDIK B, et al. Sliding windows over uncertain data streams[J]. Knowledge and Information Systems, 2015, 45(1): 159–190. doi: 10.1007/s10115-014-0804-5
    彭宇, 罗清华, 彭喜元. UIDK-means: 多维不确定性测量数据聚类算法[J]. 仪器仪表学报, 2011, 32(6): 1201–1207. doi: 10.19650/j.cnki.cjsi.2011.06.001

    PENG Yu, LUO Qinghua, and PENG Xiyuan. UIDK-means: A Multi-dimensional uncertain measurement data clustering algorithm[J]. Chinese Journal of Scientific Instrument, 2011, 32(6): 1201–1207. doi: 10.19650/j.cnki.cjsi.2011.06.001
    BAO Chaozheng, PENG Hongming, HE Di, et al. Adaptive fuzzy c-means clustering algorithm for interval data type based on interval-dividing technique[J]. Pattern Analysis and Applications, 2018, 21(3): 803–812. doi: 10.1007/s10044-017-0663-2
    D’URSO P, MASSARI R, DE GIOVANNI L, et al. Exponential distance-based fuzzy clustering for interval-valued data[J]. Fuzzy Optimization and Decision Making, 2017, 16(1): 51–70. doi: 10.1007/s10700-016-9238-8
    BRITO P, SILVA A P D, and DIAS J G. Probabilistic clustering of interval data[J]. Intelligent Data Analysis, 2015, 19(2): 293–313. doi: 10.3233/IDA-150718
    HAMDAN H. Maximum likelihood estimation from interval-valued data. Application to fuzzy clustering[C]. The 13th International Conference on Theory and Application of Fuzzy Systems and Soft Computing -ICAFS-2018. Istanbul, Turkey, 2019: 3–10. doi: 10.1007/978-3-030-04164-9_3.
    谢志伟, 王志明. 一种区间型数据的自适应模糊C均值聚类算法[J]. 计算机工程与应用, 2012, 48(17): 193–198, 237. doi: 10.3778/j.issn.1002-8331.2012.17.038

    XIE Zhiwei and WANG Zhiming. Self-adapting fuzzy c means clustering algorithm for interval data[J]. Computer Engineering and Applications, 2012, 48(17): 193–198, 237. doi: 10.3778/j.issn.1002-8331.2012.17.038
    GAO Xinbo, JI Hongbing, and XIE Weixin. A novel FCM clustering algorithm for interval-valued data and fuzzy-valued data[C]. The 5th International Conference on Signal Processing Proceedings. The 16th World Computer Congress 2000, Beijing, China, 2000: 1551–1555. doi: 10.1109/ICOSP.2000.893395.
    MACIEL L, BALLINI R, GOMIDE F, et al. Participatory learning fuzzy clustering for interval-valued data[C]. The 16th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, Eindhoven, The Netherlands, 2016: 687–698. doi: 10.1007/978-3-319-40596-4_57.
    兰蓉. 模糊信息距离及其若干应用[D]. [博士论文], 西安电子科技大学, 2013: 61–73.

    LAN Rong. Fuzzy information distances and their some applications[D]. [Ph.D. dissertation], Xidian University, 2013: 61–73.
    金萍, 宗瑜, 屈世超, 等. 面向不确定数据的近似骨架启发式聚类算法[J]. 南京大学学报: 自然科学, 2015, 51(1): 197–205. doi: 10.13232/j.cnki.jnju.2015.01.027

    JIN Ping, ZONG Yu, QU Shichao, et al. Approximate backbone guided heuristic clustering algorithm for uncertain data[J]. Journal of Nanjing University:Natural Sciences, 2015, 51(1): 197–205. doi: 10.13232/j.cnki.jnju.2015.01.027
    魏方圆, 黄德才. 基于区间数的多维不确定性数据UID-DBSCAN聚类算法[J]. 计算机科学, 2017, 44(11A): 442–447. doi: 10.11896/j.issn.1002-137X.2017.11A.094

    WEI Fangyuan and HUANG Decai. UID-DBSCAN clustering algorithm of multi-dimensional uncertain data based on interval number[J]. Computer Science, 2017, 44(11A): 442–447. doi: 10.11896/j.issn.1002-137X.2017.11A.094
    ZHANG Qin, FANG Zhigeng, LIU Sifeng, et al. On variable weight clustering model of generalized interval grey numbers for multiple uncertain data[J]. Journal of Grey System, 2019, 31(1): 84–99.
    陆亿红, 任胜亮. 基于区间数的不确定数据流2κ近邻聚类算法[J]. 浙江工业大学学报, 2018, 46(3): 321–326. doi: 10.3969/j.issn.1006-4303.2018.03.015

    LU Yihong and REN Shengliang. The clustering algorithm of uncertain data stream 2κ-near neighbors based on interval number[J]. Journal of Zhejiang University of Technology, 2018, 46(3): 321–326. doi: 10.3969/j.issn.1006-4303.2018.03.015
    张新猛, 蒋盛益. 一种基于相似度概率的不确定分类数据聚类算法[J]. 山东大学学报: 工学版, 2011, 41(3): 12–16.

    ZHANG Xinmeng and JIANG Shengyi. An algorithm for clustering uncertain categorical data based on similarity probability[J]. Journal of Shandong University:Engineering Science, 2011, 41(3): 12–16.
    TRAN L and DUCKSTEIN L. Comparison of fuzzy numbers using a fuzzy distance measure[J]. Fuzzy Sets and Systems, 2002, 130(3): 331–341. doi: 10.1016/s0165-0114(01)00195-6
  • 加载中
图(4) / 表(3)
计量
  • 文章访问数:  2184
  • HTML全文浏览量:  1076
  • PDF下载量:  109
  • 被引次数: 0
出版历程
  • 收稿日期:  2019-08-06
  • 修回日期:  2020-02-19
  • 网络出版日期:  2020-03-14
  • 刊出日期:  2020-08-18

目录

    /

    返回文章
    返回