Li Jie, Gao Xin-bo, Jiao Li-cheng. A GA-Based Clustering Algorithm for Large Data Sets with Mixed Numerical and Categorical Values[J]. Journal of Electronics & Information Technology, 2004, 26(8): 1203-1209.
Citation:
Li Jie, Gao Xin-bo, Jiao Li-cheng. A GA-Based Clustering Algorithm for Large Data Sets with Mixed Numerical and Categorical Values[J]. Journal of Electronics & Information Technology, 2004, 26(8): 1203-1209.
Li Jie, Gao Xin-bo, Jiao Li-cheng. A GA-Based Clustering Algorithm for Large Data Sets with Mixed Numerical and Categorical Values[J]. Journal of Electronics & Information Technology, 2004, 26(8): 1203-1209.
Citation:
Li Jie, Gao Xin-bo, Jiao Li-cheng. A GA-Based Clustering Algorithm for Large Data Sets with Mixed Numerical and Categorical Values[J]. Journal of Electronics & Information Technology, 2004, 26(8): 1203-1209.
In the field of data mining, it is often encountered to perform cluster analysis on large data sets with mixed numerical and categorical values. However, most existing clustering algorithms are only efficient for the numerical data rather than the mixed data set. For this purpose, this paper presents a novel clustering algorithm for these mixed data sets by modifying the common cost function, trace of the within cluster dispersion matrix. The Genetic Algorithm (GA) is used to optimize the new cost function to obtain valid clustering result. Experimental result illustrates that the GA-based new clustering algorithm is feasible for the large data sets with mixed numerical and categorical values.
Klosgen W,Zytkow J M.Knowledge Discovery in Databases Terminology.Advances in Knowledge Discovery and Data Mining,Fayyad U M,Piatetsky-Shapiro G,Smyth P,Uthurusamy R.(Eds.),AAAI Press/The MIT Press,MA,1996:573-592.[2]Cormack R M.A review of classification[J].J.Roy.Statist.Soc.Series A.1971,134:321-367[3]IBM.Data Management Solutions.IBM White Paper,IBM Corp.1996.[4]Anderberg M B.Cluster Analysis for Applications.New York:Academic Press.1973:79-90.[5]Kaufman L,Rousseeuw P J.Finding Groups in Data-An Introduction to Cluster Analysis.New York:John Wiley,1990:98-110.[6]Everitt B.Cluster Analysis.New York:Heinemann Educational Books Ltd.,1974:45-60.[7]Huang Zhexue,Michael K N.A fuzzy k-modes algorithm for clustering categorical data[J].IEEE Trans.on Fuzzy Systems.1999,7(4):446-452[8]Zhexue Huang.A fast clustering algorithm to cluster very large categorical data sets in data mining.Proceedings of the SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery,Dept.of Computer Science,The University of British Columbia,Canada,1997:1-8.[9]Holland J H.Adoption in Natural and Artificial System.Ann Arbor,MI:Univ.Mich.Press,1975:83-90.[10]Krovi R.Genetic algorithm for clustering:A preliminary investigation.Proceedings of the 25th Hawaii International Conf.on System Sciences,4,Information Systems,Hawaii,1992:504-544.