摘要:
面向Weblog的协同聚类算法具有同时发现用户聚类及与之对应的页面聚类的能力,已成为Weblog数据挖掘的重要研究内容。由于现有的面向Weblog的协同聚类算法大多采用硬划分方法将用户和页面分配到聚类,因此,无法很好地处理聚类边界的问题,即一个用户可能属于多个聚类,从而影响了聚类质量。该文给出了一种面向Weblog的模糊协同聚类FCOW (Fuzzy CO-clustering for Weblog)算法来解决协同聚类算法的边界问题,以提高聚类结果的质量。该算法首先利用矩阵Hadamard积运算发现Weblog中隐含的独立用户模式PA={pa1,paK};其次,依据pak所对应的页面子集将剩余用户分配到该独立模式中,从而产生协同聚类结果{CSk, CPk}, k=1,,K;最后计算每个用户和页面与协同聚类之间的模糊隶属度,并以该隶属度作为个性化推荐的依据。实验结果表明,FCOW算法具有获得高质量聚类结果的能力。
Abstract:
Weblog co-clustering is an important research content of Weblog mining, which has ability to find out the users clusters and pages clusters simultaneously. Most of the proposed Weblog co-clustering algorithm use hard partition method to assign the users into its corresponding cluster. However, hard partition method make these clustering algorithm can not handle the clusters bond problem very well, which has significant influence for the clustering result quality. In this paper, a Fuzzy CO-clustering for Weblog (FCOW) algorithm is proposed to overcome the default of hard partition and improve the clustering results quality of Weblog co-clustering. In particularly, the underlying users model setPA={pa1,paK} is first found by using Hadamard product; and then, the rest users are assigned to its corresponding modelpak based on page subset to generate the co-clustering result {CSk, CPk}; Finally, the fuzzy membership of each user to its page clusterCPk is calculated and this information is used to do recommendation. Experimental results on five real world datasets show that FCOW has ability for improving the clustering quality of Weblog co-clustering.