Liu Yuan-chao, Wang Xiao-long, Liu Bing-quan, Zhong Bin-bin. The Clustering Analysis Technology for Information Retrieval[J]. Journal of Electronics & Information Technology, 2006, 28(4): 606-609.
Citation:
Liu Yuan-chao, Wang Xiao-long, Liu Bing-quan, Zhong Bin-bin. The Clustering Analysis Technology for Information Retrieval[J]. Journal of Electronics & Information Technology, 2006, 28(4): 606-609.
Liu Yuan-chao, Wang Xiao-long, Liu Bing-quan, Zhong Bin-bin. The Clustering Analysis Technology for Information Retrieval[J]. Journal of Electronics & Information Technology, 2006, 28(4): 606-609.
Citation:
Liu Yuan-chao, Wang Xiao-long, Liu Bing-quan, Zhong Bin-bin. The Clustering Analysis Technology for Information Retrieval[J]. Journal of Electronics & Information Technology, 2006, 28(4): 606-609.
The rapid development of Information Retrieval(IR) and search engine improves recall rate greatly, whereas the enhancement on both precision rate and information retrieval efficiency is not clear. The research on document clustering and multi-document keyword extraction will help solve this problem. The basic idea is to cluster part of the documents returned by search engine, and automatically extract some keywords for each cluster. Thus user can judge whether the documents in each cluster are relevant to his need. In this paper the concept of document relevancy and cluster relevancy are proposed, and both word frequency and the concept relevancy model of HOWNET are used to compute cluster relevancy, which is used to guide the merging process of clusters. The experimental results show that the IR efficiency has improved greatly.