Clustering Algorithms for Large-scale Social Networks Based on Structural Similarity
-
摘要: 针对社交网络的有向交互性和大规模特性,该文提出一种基于结构相似度的有向网络聚类算法(DirSCAN),以及相应的分布式并行算法(PDirSCAN)。考虑社交网络中节点间的有向交互性,将行为结构相似的节点聚集起来,并进行节点功能分析。针对社交网络规模巨大的特点,提出MapReduce框架下的分布式并行聚类算法,在确保聚类结果一致的前提下,提高处理性能。大量真实数据集上的实验结果表明,DirSCAN比无向网络聚类算法(SCAN)在F1上可提高2.34%的性能,并行算法PDirSCAN比DirSCAN运行速度提升1.67倍,能够有效处理大规模的有向网络聚类问题。Abstract: To cluster the directed and large-scale social networks, a Structural Clustering Algorithm for Directed Networks (DirSCAN) and a corresponding Parallel algorithm (PDirSCAN) are proposed. Considering oriented behavioral relation between two vertices, DirSCAN is constructed based on action structural similarity and function analysis. To meet the need of large-scale social network analysis, a lossless PDirSCAN based on MapReduce distributed parallel architecture is designed to improve the processing performance. A large number of experimental results on real-world network datasets show that DirSCAN improves performance of SCAN up to 2.34% on F1, PDirSCAN runs 1.67 times faster than DirSCAN.
-
Key words:
- Social networks /
- Directed network clustering /
- Parallel algorithm /
- MapReduce
计量
- 文章访问数: 1886
- HTML全文浏览量: 142
- PDF下载量: 1596
- 被引次数: 0