信息检索中的聚类分析技术

刘远超; 王晓龙; 刘秉权; 钟彬彬

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名

邮箱

手机号码

标题

留言内容

验证码

信息检索中的聚类分析技术

计量
- 文章访问数: 2367
- HTML全文浏览量: 181
- PDF下载量: 814
- 被引次数: 0
出版历程
- 收稿日期: 2005-01-10
- 修回日期: 2005-09-26
- 刊出日期: 2006-04-19

The Clustering Analysis Technology for Information Retrieval

摘要

摘要: 信息检索/搜索引擎技术的快速发展使得信息的查全率有较大提高，而查准率以及人们获取信息的效率改善却不明显。文本聚类和多文档关键词的自动生成技术将有助于解决这一问题。其基本思想是对检索到的部分文档进行聚类处理，并对每类文档自动生成关键词，从而帮助用户判断各个类别的文档和检索需求是否相关。该文提出文档相关度和类别相关度的概念，并利用词频信息以及知网(HOWNET)中词的概念计算模型计算类别相关度，将其作为聚类合并的依据。信息获取的仿真实验表明文档检索效率有较大提高。
- 文档聚类; 关键词抽取; 知网; 文档相关度
Abstract: The rapid development of Information Retrieval(IR) and search engine improves recall rate greatly, whereas the enhancement on both precision rate and information retrieval efficiency is not clear. The research on document clustering and multi-document keyword extraction will help solve this problem. The basic idea is to cluster part of the documents returned by search engine, and automatically extract some keywords for each cluster. Thus user can judge whether the documents in each cluster are relevant to his need. In this paper the concept of document relevancy and cluster relevancy are proposed, and both word frequency and the concept relevancy model of HOWNET are used to compute cluster relevancy, which is used to guide the merging process of clusters. The experimental results show that the IR efficiency has improved greatly.

HTML全文

参考文献(0)

施引文献

资源附件(0)

访问统计

计量

文章访问数: 2367
HTML全文浏览量: 181
PDF下载量: 814
被引次数: 0

留言板

信息检索中的聚类分析技术

计量

出版历程

The Clustering Analysis Technology for Information Retrieval

计量

出版历程

目录