Advanced Search
Volume 26 Issue 8
Aug.  2004
Turn off MathJax
Article Contents
Li Jie, Gao Xin-bo, Jiao Li-cheng. A GA-Based Clustering Algorithm for Large Data Sets with Mixed Numerical and Categorical Values[J]. Journal of Electronics & Information Technology, 2004, 26(8): 1203-1209.
Citation: Li Jie, Gao Xin-bo, Jiao Li-cheng. A GA-Based Clustering Algorithm for Large Data Sets with Mixed Numerical and Categorical Values[J]. Journal of Electronics & Information Technology, 2004, 26(8): 1203-1209.

A GA-Based Clustering Algorithm for Large Data Sets with Mixed Numerical and Categorical Values

  • Received Date: 2003-03-27
  • Rev Recd Date: 2003-07-08
  • Publish Date: 2004-08-19
  • In the field of data mining, it is often encountered to perform cluster analysis on large data sets with mixed numerical and categorical values. However, most existing clustering algorithms are only efficient for the numerical data rather than the mixed data set. For this purpose, this paper presents a novel clustering algorithm for these mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The Genetic Algorithm (GA) is used to optimize the new cost function to obtain valid clustering result. Experimental result illustrates that the GA-based new clustering algorithm is feasible for the large data sets with mixed numerical and categorical values.
  • loading
  • Klosgen W,Zytkow J M.Knowledge Discovery in Databases Terminology.Advances in Knowledge Discovery and Data Mining,Fayyad U M,Piatetsky-Shapiro G,Smyth P,Uthurusamy R.(Eds.),AAAI Press/The MIT Press,MA,1996:573-592.[2]Cormack R M.A review of classification[J].J.Roy.Statist.Soc.Series A.1971,134:321-367[3]IBM.Data Management Solutions.IBM White Paper,IBM Corp.1996.[4]Anderberg M B.Cluster Analysis for Applications.New York:Academic Press.1973:79-90.[5]Kaufman L,Rousseeuw P J.Finding Groups in Data-An Introduction to Cluster Analysis.New York:John Wiley,1990:98-110.[6]Everitt B.Cluster Analysis.New York:Heinemann Educational Books Ltd.,1974:45-60.[7]Huang Zhexue,Michael K N.A fuzzy k-modes algorithm for clustering categorical data[J].IEEE Trans.on Fuzzy Systems.1999,7(4):446-452[8]Zhexue Huang.A fast clustering algorithm to cluster very large categorical data sets in data mining.Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery,Dept.of Computer Science,The University of British Columbia,Canada,1997:1-8.[9]Holland J H.Adoption in Natural and Artificial System.Ann Arbor,MI:Univ.Mich.Press,1975:83-90.[10]Krovi R.Genetic algorithm for clustering:A preliminary investigation.Proceedings of the 25th Hawaii International Conf.on System Sciences,4,Information Systems,Hawaii,1992:504-544.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (2917) PDF downloads(973) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return