多文档文摘句子优选算法研究
doi: 10.3724/SP.J.1146.2007.00876
Research on Sentence Optimum Selection Algorithm for Multi-Document Summarization
-
摘要: 该文通过对文摘句的选择问题进行分析,提出了一种文摘句优选方法,相对于传统的逐个添加句子生成文摘的方法,该文提出的方法是在一定范围内逐个删除句子生成文摘。该方法分两阶段进行句子选择,第1阶段获取候选文摘句子集合,采用了直接获取算法和基于冗余信息处理的获取算法。第2阶段逐步删除句子,分别以不同特征项作为衡量句子对候选文摘句子集合的贡献,提出了文摘句优选算法。以DUC2004为实验语料,通过经句子选择后生成文摘的ROUGE得分,验证了句子选择在文摘生成过程中的必要性,与基于冗余信息处理的句子选择方法比较,验证了该文提出算法的有效性。Abstract: Analyzing sentences selection in summarization, an approach based on deleting sentences in a sentences set to obtain summary is proposed, which differs from the traditional method of adding sentences to get the summary. It has two stages, one is the process of obtaining the candidate summary sentences set with direct obtaining algorithm and redundancy-based obtaining algorithm, the other is the process of deleting sentences with sentences optimum algorithm. With DUC 2004 as the test corpus, the ROUGE value of summaries gotten by sentences selection proves the necessity of sentences optimum selection for multi-document summarization. Compared with the redundancy-based sentences selection method, the validity of the approach proposed is proved.
计量
- 文章访问数: 3544
- HTML全文浏览量: 81
- PDF下载量: 1091
- 被引次数: 0