局部显著单元高维聚类算法

宗瑜; 李明楚; 徐贯东; 张彦春

doi:10.3724/SP.J.1146.2009.01589

局部显著单元高维聚类算法

doi: 10.3724/SP.J.1146.2009.01589

宗瑜^{①② 李明楚,},
李明楚,
徐贯东,
张彦春

基金项目:

国家自然科学重点基金 (90715037)，国家973计划项目(2007CB714205)，澳大利亚ARC项目(DP0770479)和安徽省教育厅重点项目(KJ2009A54，KJ2010A325)资助课题

计量
- 文章访问数: 3349
- HTML全文浏览量: 88
- PDF下载量: 681
- 被引次数: 0
出版历程
- 收稿日期: 2009-12-11
- 修回日期: 2010-05-20
- 刊出日期: 2010-11-19

High Dimensional Clustering Algorithm Based on Local Significant Units

Zong Yu^{①② 李明楚
,},
Li Ming-Chu,
Xu Guan-Dong,
Zhang Yan-Chun

摘要

摘要: 以等宽或随机宽度网格密度单元为基础的高维聚类算法不能保证复杂数据集中的聚类结果的质量。该文在核密度估计和空间统计理论的基础上，给出一种基于局部显著单元的高维聚类算法来处理复杂数据的高维聚类问题。该方法以局部核密度估计和空间统计理论为基础定义了局部显著单元结构来捕获局部数据分布；设计了能快速发现覆盖数据分布的局部显著区域的贪婪算法；对具有相同属性子集的局部显著单元执行Single-linkage算法发现其中的聚类结果。实验结果表明，以局部显著单元为基础的高维聚类算法能够发现复杂数据集中隐含的高质量聚类结果。
- 聚类分析 /
- 高维聚类算法 /
- 核密度估计 /
- 局部显著单元
Abstract: High dimensional clustering algorithm based on equal or random width density grid cannot guarantee high quality clustering results in complicated data sets. In this paper, a High dimensional Clustering algorithm based on Local Significant Unit (HC_LSU) is proposed to deal with this problem, based on the kernel estimation and spatial statistical theory. Firstly, a structure, namely Local Significant Unit (LSU) is introduced by local kernel density estimation and spatial statistical test; secondly, a greedy algorithm named Greedy Algorithm for LSU (GA_LSU) is proposed to quickly find out the local significant units in the data set; and eventually, the single-linkage algorithm is run on the local significant units with the same attribute subset to generate the clustering results. Experimental results on 4 synthetic and 6 real world data sets showed that the proposed high-dimensional clustering algorithm, HC_LSU, could effectively find out high quality clustering results from the highly complicated data sets.
- Clustering analysis /
- High dimensional Clustering (HC) algorithm /
- Kernel density estimation /
- Local Significant Unit (LSU)

HTML全文

参考文献(1)

孙吉贵, 刘杰, 赵连宇. 聚类算法研究[J].软件学报.2008, 19(1):48-61Sun Ji-gui, Liu Jie, and Zhao Lian-yu. Clustering algorithms research[J].Journal of Software.2008, 19(1):48-61[2]Hinneburg A and Keim D A. An efficient approach to clustering in large multimedia databases with noise [C]. Processing of the 4th International Conference on Knowledge Discovery and Data Mining, New York: AAAI Press, 1998: 58-68.[3]Hinneburg A and Gabriel H H. DENCLUS2.0: Fast Clustering based on kernel density estimation[C]. IDA, 2007, LNCS 4723: 70-80.[4]Vineet C J, Mohammad A H, and Saeed S, et al.. SPARCL: Efficient and effective shape-based clustering[C]. Proceedings of 8th IEEE International Conference on Data Mining, Pisa, Italy, 2008: 93-102.[5]Tao P, Ajay J, and David J H, et al.. DECODE: A new method for discovering clusters of different densities in spatial data [J].Data Mining and Knowledge Discovery.2009, 18(3):337-369[6]Hans H P, Peer K, and Arthir Z. Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering [J]. ACM Transactions on Knowledge Discovery from Data (TKDD), 2009, 3(1): 1-58.[7]Ng K, Fu A, and Wong C W. Projective clustering by histograms [J].IEEE Transactions on Knowledge and Data Engineering.2005, 17(3):369-383[8]Moise G, Sander J, and Ester M. Robust projected clustering [J]. Knowledge Information System, 2008, (14): 273-298.[9]Moise G and Sander J. Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projective and subspace clustering[C]. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD,08) Lasvegas, 2008: 533-541.[10]Liu H, Lafferty J, and Wasserman L. Sparse nonparametric density estimation in high dimensions using the rodeo[C]. 11th International Conference on Artificial Intelligence and Statistics, AISTATS, Florida, 2007: 1049-1062.Lafferty J D and Wasserman L A. Rodeo: Sparse nonparametric regression in high dimensions [J]. Advances in Neural Information Processing System, 2007(18): 1-45.[11]Baddeley A. Spatial point processes and their applications[J].Lecture Notes in Mathematics.2007, 1892:1-75[12]Aggarwal C C, Procopiuc C, and Wolf J, et al.. Fast algorithms for projected clustering[C]. Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD,99) (ACM SIGKDD, Philadelphia,1999), 1999: 61-72.[13]Muller E, Assent I, and Krieger R, et al.. Density estimation for data mining in high dimensional spaces[C]. SDM, Nevada, USA, 2009: 173-184.[14]Alon U, Barkai N, and Notterman K, et al.. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays[J]. PNAS, 1999(96): 6745-6750.[15]Zhang Y C and Xu G D. On web communities mining and recommendation [J].Concurrency and Computation: Practice and Experience.2009, 21(5):561-582

施引文献

资源附件(0)

访问统计