高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

基于快速地标采样的大规模谱聚类算法

叶茂 刘文芬

叶茂, 刘文芬. 基于快速地标采样的大规模谱聚类算法[J]. 电子与信息学报, 2017, 39(2): 278-284. doi: 10.11999/JEIT160260
引用本文: 叶茂, 刘文芬. 基于快速地标采样的大规模谱聚类算法[J]. 电子与信息学报, 2017, 39(2): 278-284. doi: 10.11999/JEIT160260
YE Mao, LIU Wenfen. Large Scale Spectral Clustering Based on Fast Landmark Sampling[J]. Journal of Electronics & Information Technology, 2017, 39(2): 278-284. doi: 10.11999/JEIT160260
Citation: YE Mao, LIU Wenfen. Large Scale Spectral Clustering Based on Fast Landmark Sampling[J]. Journal of Electronics & Information Technology, 2017, 39(2): 278-284. doi: 10.11999/JEIT160260

基于快速地标采样的大规模谱聚类算法

doi: 10.11999/JEIT160260
基金项目: 

国家973计划(2012CB315905), 国家自然科学基金(61502527, 61379150)

Large Scale Spectral Clustering Based on Fast Landmark Sampling

Funds: 

The National 973 Program of China (2012CB315905), The National Natural Science Foundation of China (61502527, 61379150)

  • 摘要: 为避免传统谱聚类算法高复杂度的应用局限,基于地标表示的谱聚类算法利用地标点与数据集各点间的相似度矩阵,有效降低了谱嵌入的计算复杂度。在大数据集情况下,现有的随机抽取地标点的方法会影响聚类结果的稳定性,k均值中心点方法面临收敛时间未知、反复读取数据的问题。该文将近似奇异值分解应用于基于地标点的谱聚类,设计了一种快速地标点采样算法。该算法利用由近似奇异向量矩阵行向量的长度计算的抽样概率来进行抽样,同随机抽样策略相比,保证了聚类结果的稳定性和精度,同k均值中心点策略相比降低了算法复杂度。同时从理论上分析了抽样结果对原始数据的信息保持性,并对算法的性能进行了实验验证。
  • 何清, 李宁, 罗文娟, 等. 大数据下的机器学习算法综述[J]. 模式识别与人工智能, 2014, 27(4): 327-336.
    HE Qing, LI Ning, LUO Wenjuan, et al. A survey of machine learning algorithms for big data[J]. Pattern Recognition and Artificial Intelligence, 2014, 27(4): 327-336.
    DING S, JIA H, ZHANG L, et al. Research of semi-supervised spectral clustering algorithm based on pairwise constraints[J]. Neural Computing and Applications, 2014, 24(1): 211-219. doi: 10.1007/s00521-012-1207-8.
    NG A Y, JORDAN M I, and WEISS Y. On spectral clustering: Analysis and an algorithm[C]. Neural Information Processing Systems: Natural and Synthetic, Vancouver, Canada, 2001: 849-856.
    FOWLKES C, BELONGIE S, CHUNG F, et al. Spectral grouping using the Nystrom method[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, 26(2): 214-225. doi: 10.1109/TPAMI.2004.1262185.
    LI M, KWOK J T, and LU B L. Making large-scale Nystrm approximation possible[C]. Proceedings of the 27th International Conference on Machine Learning, Haifa, Israel, 2010: 631-638.
    LI M, BI W, KWORK J T, et al. Large-scale Nystrm kernel matrix approximation using randomized SVD[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(1): 152-164. doi: 10.1109/TNNLS.2014.2359798.
    丁世飞, 贾洪杰, 史忠植. 基于自适应Nystrm 采样的大数据谱聚类算法[J]. 软件学报, 2014, 25(9): 2037-2049. doi: 10.13328/j.cnki.jos.004643.
    DING Shifei, JIA Hongjie, and SHI Zhongzhi. Spectral clustering algorithm based on adaptive Nystrm sampling for big data analysis[J]. Journal of Software, 2014, 25(9): 2037-2049. doi: 10.13328/j.cnki.jos.004643.
    YAN D, HUANG L, and JORDAN M I. Fast approximate spectral clustering[C]. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 2009: 907-916. doi: 10.1145/1557019.1557118.
    CHEN X and CAI D. Large scale spectral clustering with landmark-based representation[C]. Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco, California, USA, 2011: 313-318.
    CAI D and CHEN X. Large scale spectral clustering via landmark-based sparse representation[J]. IEEE Transactions on Cybernetics, 2015, 45(8): 1669-1680. doi: 10.1109/TCYB. 2014.2358564.
    BOUTSIDIS C, ZOUZIAS A, MAHONEY M W, et al. Randomized dimensionality reduction for-means clustering[J]. IEEE Transactions on Information Theory, 2015, 61(2): 1045-1062. doi: 10.1109/TIT.2014.2375327.
    COHEN M, ELDER S, MUSCO C, et al. Dimensionality reduction for k-means clustering and low rank approximation[C]. Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, Portland, OR, USA, 2015: 163-172. doi: 10.1145/2746539.2746569.
    KHOA N L D and CHAWLA S. A scalable approach to spectral clustering with SDD solvers[J]. Journal of Intelligent Information Systems, 2015, 44(2): 289-308. doi: 10.1007/ s10844-013-0285-0.
    FRIEZE A, KANNAN R, and VEMPALA S. Fast Monte-Carlo algorithms for finding low-rank approximations[C]. Proceedings of the 39th Annual Symposium on Foundations of Computer Science, Palo Alto, California, USA, 1998: 370-378. doi: 10.1109/SFCS. 1998.743487.
    DRINEAS P, MAHONEY M W, and MUTHUKRISHNAN S. Sampling algorithms for l2 regression and applications[C]. Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, Miami, Florida, USA, 2006: 1127-1136.
    DRINES P, MAHONEY M W, and MUTHUKRISHNAN S. Subspace sampling and relative-error matrix approximation: Column-based methods[C]. 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems and 10th International Workshop on Randomization and Computation, Barcelona, Spain, 2006: 316-326. doi: 10.1007/11830924_30.
    BOUTSIDIS C, DRINEAS P, and MAGDON-ISMAIL M. Near-optimal column-based matrix reconstruction [J]. SIAM Journal on Computing, 2014, 43(2): 687-717. doi: 10.1137/12086755X.
    HALKO N, MARTINSSON P G, and TROPP J A. Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions[J]. SIAM Review, 2011, 53(2): 217-288. doi: 10.1137/090771806.
    SARLOIS T. Improved approximation algorithms for large matrices via random projections[C]. Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, Berkeley, California, USA, 2006: 143-152. doi: 10.1109/FOCS.2006.37.
    CHEN W Y, SONG Y, BAI H, et al. Parallel spectral clustering in distributed systems[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(3): 568-586. doi: 10.1109/TPAMI.2010.88.
    AFAHAD A, ALSHATRI N, TARI Z, et al. A survey of clustering algorithms for big data: Taxonomy and empirical analysis[J]. IEEE Transactions on Emerging Topics in Computing, 2014, 2(3): 267-279. doi: 10.1109/TETC. 2014.2330519.
    STREHL A and GHOSH J. Cluster ensemblesA knowledge reuse framework for combining multiple partitions[J]. The Journal of Machine Learning Research, 2003, 3: 583-617. doi: 10.1162/153244303321897735.
  • 加载中
计量
  • 文章访问数:  1705
  • HTML全文浏览量:  176
  • PDF下载量:  543
  • 被引次数: 0
出版历程
  • 收稿日期:  2016-03-21
  • 修回日期:  2016-07-18
  • 刊出日期:  2017-02-19

目录

    /

    返回文章
    返回