检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:唐建权 何洪波[1] 王闰强[1] Tang Jianquan;He Hongbo;Wang Runqiang(Computer Network Information Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China)
机构地区:[1]中国科学院计算机网络信息中心,北京100190 [2]中国科学院大学,北京100049
出 处:《科研信息化技术与应用》2019年第1期12-19,共8页E-science Technology & Application
基 金:中国科学院十三五信息化建设专项(XXH13504-04)
摘 要:本文提出一种基于聚类的自动摘要方法,该方法将文章中所有句子转换为句子向量,并且使用基于AGNES的算法对删除离群点后的句子向量进行聚类,形成多个句子向量个数不超过设定阈值的簇,删除句子向量个数极少的“无用”簇后,从每个簇中挑选出最能代表簇的语义的一个或者两个句子,根据每个句子到语义中心的距离等信息从小到大确定指定数量的几个句子作为关键句,拼接成文章摘要。通过在中国科普博览网站上随机挑选的500篇科普文章上进行实验,上述方法各项指标均优于基于图模型的TextRank算法以及基于统计的方法,证明了该方法的有效性和可用性。This paper proposed a clustering-based method for automatic summarization.At first,it converts all sentences into sentence vectors and deletes outliers,and then uses the AGNES-based algorithm to cluster the sentence vectors,so that clusters can be obtained whose number of sentence vectors does not exceed the threshold.After deleting the“useless”clusters,whose number of sentence vectors is very small,it selects one or two most representative sentences from each cluster.Finally,according to the distance from each sentence to the center of its cluster and other information,it determines a specified number of sentences as key sentences and splices them to produce abstracts.Experiment on 500 popular science articles,which are randomly selected from Science Museums of China,shows that the above method is superior to the TextRank algorithm based on graph model and the statistical method,which proves the validity and feasibility of the method.
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229