一种基于HITS算法的Blog文摘方法  被引量:7

A New HITS-Based Summarization Approach for Blog

在线阅读下载全文

作  者:苗家[1] 马军[1] 陈竹敏[1] 

机构地区:[1]山东大学计算机科学与技术学院,山东济南250101

出  处:《中文信息学报》2011年第1期104-109,共6页Journal of Chinese Information Processing

基  金:国家自然科学基金资助项目(60970047);山东省科技攻关资助项目(2007GG10001002;2008GG10001026);山东省自然科学基金资助项目(Y2008G19)

摘  要:Blog文章对应了大量评论信息,评论中又包含大量的噪声,因此如何结合Blog评论获取Blog文章的主要内容是许多基于Blog的应用所要面临的难题。以往提出的文摘方法大多是针对多文档文摘的通用方法,并未考虑Blog文章的特殊性,无法有效地结合评论来处理文章。该文通过分析Blog的特点提出了一种新的结合评论信息的Blog文摘方法。该方法首先基于特征计算出评论的权重,然后结合图模型使用HITS算法得到正文句子权重,进而得到文摘句。通过在凤凰博客数据集上的实验表明,该文方法在ROUGE测度上优于以往方法。Since blog contains many comments involving massive noise,how to summarize the content of blog posts together with the comments is a difficult task for many blog applications.The previous works for textual document summarization are mostly for multi-document summarization in general.Without taking the particularity of blog into account,the previous works are inefficient for blog posts with comments.This paper proposes a novel summarization approach for blog based on the characteristics of the blog posts in which the information of comments are well considered.We first calculate the weights of the comments based on multi-features of the comments.Then we calculate the weights of the sentences in blog post based on HITS model.Finally we select sentences from the blog post according to their weights.We conduct an experiment on the dataset of Ifeng blog,and it shows that our approach works better than some previous works in terms of the score of ROUGE.

关 键 词:文档自动摘要 BLOG 评论 HITS 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象