Large Scale Spectral Clustering Based on Fast Landmark Sampling

YE Mao; LIU Wenfen

doi:10.11999/JEIT160260

Volume 39 Issue 2

Feb. 2017

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2017 > 39(2): 278-284

YE Mao, LIU Wenfen. Large Scale Spectral Clustering Based on Fast Landmark Sampling[J]. Journal of Electronics & Information Technology, 2017, 39(2): 278-284. doi: 10.11999/JEIT160260

Citation:

YE Mao, LIU Wenfen. Large Scale Spectral Clustering Based on Fast Landmark Sampling[J]. Journal of Electronics & Information Technology, 2017, 39(2): 278-284. doi: 10.11999/JEIT160260

YE Mao, LIU Wenfen. Large Scale Spectral Clustering Based on Fast Landmark Sampling[J]. Journal of Electronics & Information Technology, 2017, 39(2): 278-284. doi: 10.11999/JEIT160260

Citation:

YE Mao, LIU Wenfen. Large Scale Spectral Clustering Based on Fast Landmark Sampling[J]. Journal of Electronics & Information Technology, 2017, 39(2): 278-284. doi: 10.11999/JEIT160260

PDF( 439 KB)

Large Scale Spectral Clustering Based on Fast Landmark Sampling

doi: 10.11999/JEIT160260

YE Mao^{1
,
,},
LIU Wenfen¹

Funds:

The National 973 Program of China (2012CB315905), The National Natural Science Foundation of China (61502527, 61379150)

Received Date: 2016-03-21
Rev Recd Date: 2016-07-18
Publish Date: 2017-02-19

Abstract

Abstract

The applicability of traditional spectral clustering is limited by its high complexity in large-scale data sets. Through construction of affinity matrix between landmark points and data points, the Landmark-based Spectral Clustering (LSC) algorithm can significantly reduce the computational complexity of spectral embedding. It is vital for clustering results to apply the suitable strategies of the generation of landmark points. While considering big data problems, the existing generation strategies of landmark points face some deficiencies: the unstable results of random sampling, along with the unknown convergence time and the repeatability of data reading in k-means centers method. In this paper, a rapid landmark-sampling spectral clustering algorithm based on the approximate singular value decomposition is designed, which makes the sampling probability of each landmark point decided by the row norm of the approximate singular vector matrix. Compared with LSC algorithm based on random sampling, the clustering result of new algorithm is more stable and accurate; compared with LSC algorithm based on k-means centers, the new algorithm reduces the computational complexity. Moreover, the preservation of information in original data is analyzed for the landmark-sampling results theoretically. At the same time, the performance of new approach is verified by the experiments in some public data sets.
- Landmark sampling,
- Big data,
- Spectral clustering,
- Approximate singular value decomposition

FullText(HTML)

References(24)

References

何清, 李宁, 罗文娟, 等. 大数据下的机器学习算法综述[J]. 模式识别与人工智能, 2014, 27(4): 327-336.

HE Qing, LI Ning, LUO Wenjuan, et al. A survey of machine learning algorithms for big data[J]. Pattern Recognition and Artificial Intelligence, 2014, 27(4): 327-336.

DING S, JIA H, ZHANG L, et al. Research of semi-supervised spectral clustering algorithm based on pairwise constraints[J]. Neural Computing and Applications, 2014, 24(1): 211-219. doi: 10.1007/s00521-012-1207-8.

NG A Y, JORDAN M I, and WEISS Y. On spectral clustering: Analysis and an algorithm[C]. Neural Information Processing Systems: Natural and Synthetic, Vancouver, Canada, 2001: 849-856.

FOWLKES C, BELONGIE S, CHUNG F, et al. Spectral grouping using the Nystrom method[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(2): 214-225. doi: 10.1109/TPAMI.2004.1262185.

LI M, KWOK J T, and LU B L. Making large-scale Nystrm approximation possible[C]. Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 2010: 631-638.

LI M, BI W, KWORK J T, et al. Large-scale Nystrm kernel matrix approximation using randomized SVD[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(1): 152-164. doi: 10.1109/TNNLS.2014.2359798.

丁世飞, 贾洪杰, 史忠植. 基于自适应Nystrm 采样的大数据谱聚类算法[J]. 软件学报, 2014, 25(9): 2037-2049. doi: 10.13328/j.cnki.jos.004643.

DING Shifei, JIA Hongjie, and SHI Zhongzhi. Spectral clustering algorithm based on adaptive Nystrm sampling for big data analysis[J]. Journal of Software, 2014, 25(9): 2037-2049. doi: 10.13328/j.cnki.jos.004643.

YAN D, HUANG L, and JORDAN M I. Fast approximate spectral clustering[C]. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 2009: 907-916. doi: 10.1145/1557019.1557118.

CHEN X and CAI D. Large scale spectral clustering with landmark-based representation[C]. Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 2011: 313-318.

CAI D and CHEN X. Large scale spectral clustering via landmark-based sparse representation[J]. IEEE Transactions on Cybernetics, 2015, 45(8): 1669-1680. doi: 10.1109/TCYB. 2014.2358564.

BOUTSIDIS C, ZOUZIAS A, MAHONEY M W, et al. Randomized dimensionality reduction for-means clustering[J]. IEEE Transactions on Information Theory, 2015, 61(2): 1045-1062. doi: 10.1109/TIT.2014.2375327.

COHEN M, ELDER S, MUSCO C, et al. Dimensionality reduction for k-means clustering and low rank approximation[C]. Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, Portland, OR, USA, 2015: 163-172. doi: 10.1145/2746539.2746569.

KHOA N L D and CHAWLA S. A scalable approach to spectral clustering with SDD solvers[J]. Journal of Intelligent Information Systems, 2015, 44(2): 289-308. doi: 10.1007/ s10844-013-0285-0.

FRIEZE A, KANNAN R, and VEMPALA S. Fast Monte-Carlo algorithms for finding low-rank approximations[C]. Proceedings of the 39th Annual Symposium on Foundations of Computer Science, Palo Alto, California, USA, 1998: 370-378. doi: 10.1109/SFCS. 1998.743487.

DRINEAS P, MAHONEY M W, and MUTHUKRISHNAN S. Sampling algorithms for l2 regression and applications[C]. Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, Miami, Florida, USA, 2006: 1127-1136.

DRINES P, MAHONEY M W, and MUTHUKRISHNAN S. Subspace sampling and relative-error matrix approximation: Column-based methods[C]. 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems and 10th International Workshop on Randomization and Computation, Barcelona, Spain, 2006: 316-326. doi: 10.1007/11830924_30.

BOUTSIDIS C, DRINEAS P, and MAGDON-ISMAIL M. Near-optimal column-based matrix reconstruction [J]. SIAM Journal on Computing, 2014, 43(2): 687-717. doi: 10.1137/12086755X.

HALKO N, MARTINSSON P G, and TROPP J A. Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions[J]. SIAM Review, 2011, 53(2): 217-288. doi: 10.1137/090771806.

SARLOIS T. Improved approximation algorithms for large matrices via random projections[C]. Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, Berkeley, California, USA, 2006: 143-152. doi: 10.1109/FOCS.2006.37.

CHEN W Y, SONG Y, BAI H, et al. Parallel spectral clustering in distributed systems[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(3): 568-586. doi: 10.1109/TPAMI.2010.88.

AFAHAD A, ALSHATRI N, TARI Z, et al. A survey of clustering algorithms for big data: Taxonomy and empirical analysis[J]. IEEE Transactions on Emerging Topics in Computing, 2014, 2(3): 267-279. doi: 10.1109/TETC. 2014.2330519.

STREHL A and GHOSH J. Cluster ensemblesA knowledge reuse framework for combining multiple partitions[J]. The Journal of Machine Learning Research, 2003, 3: 583-617. doi: 10.1162/153244303321897735.

Relative Articles

Supplements(0)

Cited By

Proportional views