检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]山东大学计算机科学与技术学院,山东济南250101
出 处:《中文信息学报》2011年第1期104-109,共6页Journal of Chinese Information Processing
基 金:国家自然科学基金资助项目(60970047);山东省科技攻关资助项目(2007GG10001002;2008GG10001026);山东省自然科学基金资助项目(Y2008G19)
摘 要:Blog文章对应了大量评论信息,评论中又包含大量的噪声,因此如何结合Blog评论获取Blog文章的主要内容是许多基于Blog的应用所要面临的难题。以往提出的文摘方法大多是针对多文档文摘的通用方法,并未考虑Blog文章的特殊性,无法有效地结合评论来处理文章。该文通过分析Blog的特点提出了一种新的结合评论信息的Blog文摘方法。该方法首先基于特征计算出评论的权重,然后结合图模型使用HITS算法得到正文句子权重,进而得到文摘句。通过在凤凰博客数据集上的实验表明,该文方法在ROUGE测度上优于以往方法。Since blog contains many comments involving massive noise,how to summarize the content of blog posts together with the comments is a difficult task for many blog applications.The previous works for textual document summarization are mostly for multi-document summarization in general.Without taking the particularity of blog into account,the previous works are inefficient for blog posts with comments.This paper proposes a novel summarization approach for blog based on the characteristics of the blog posts in which the information of comments are well considered.We first calculate the weights of the comments based on multi-features of the comments.Then we calculate the weights of the sentences in blog post based on HITS model.Finally we select sentences from the blog post according to their weights.We conduct an experiment on the dataset of Ifeng blog,and it shows that our approach works better than some previous works in terms of the score of ROUGE.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38