一种基于聚类的文章自动摘要方法及实现  被引量:1

A Clustering-Based Method for Automatic Summarization of Articles and It’s Implementation

在线阅读下载全文

作  者:唐建权 何洪波[1] 王闰强[1] Tang Jianquan;He Hongbo;Wang Runqiang(Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China)

机构地区:[1]中国科学院计算机网络信息中心,北京100190 [2]中国科学院大学,北京100049

出  处:《科研信息化技术与应用》2019年第1期12-19,共8页E-science Technology & Application

基  金:中国科学院十三五信息化建设专项(XXH13504-04)

摘  要:本文提出一种基于聚类的自动摘要方法,该方法将文章中所有句子转换为句子向量,并且使用基于AGNES的算法对删除离群点后的句子向量进行聚类,形成多个句子向量个数不超过设定阈值的簇,删除句子向量个数极少的“无用”簇后,从每个簇中挑选出最能代表簇的语义的一个或者两个句子,根据每个句子到语义中心的距离等信息从小到大确定指定数量的几个句子作为关键句,拼接成文章摘要。通过在中国科普博览网站上随机挑选的500篇科普文章上进行实验,上述方法各项指标均优于基于图模型的TextRank算法以及基于统计的方法,证明了该方法的有效性和可用性。This paper proposed a clustering-based method for automatic summarization.At first,it converts all sentences into sentence vectors and deletes outliers,and then uses the AGNES-based algorithm to cluster the sentence vectors,so that clusters can be obtained whose number of sentence vectors does not exceed the threshold.After deleting the“useless”clusters,whose number of sentence vectors is very small,it selects one or two most representative sentences from each cluster.Finally,according to the distance from each sentence to the center of its cluster and other information,it determines a specified number of sentences as key sentences and splices them to produce abstracts.Experiment on 500 popular science articles,which are randomly selected from Science Museums of China,shows that the above method is superior to the TextRank algorithm based on graph model and the statistical method,which proves the validity and feasibility of the method.

关 键 词:自动摘要 聚类 AGNES 簇的语义 

分 类 号:G63[文化科学—教育学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象