检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:黄浩宁 陈志敏 徐聪 张晓燕[3] HUANG Haoning;CHEN Zhimin;XU Cong;ZHANG Xiaoyan(National Space Science Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China;State Radio Monitoring Center,Beijing 100037,China)
机构地区:[1]中国科学院国家空间科学中心,北京100190 [2]中国科学院大学,北京100049 [3]国家无线电监测中心,北京100037
出 处:《北京航空航天大学学报》2024年第1期317-327,共11页Journal of Beijing University of Aeronautics and Astronautics
基 金:国家自然科学基金(91738101);国家重点研发计划(2020YFB1807900)。
摘 要:互联网海量的航天新闻中隐含着大量航天情报信息,对其进行理解与压缩是提高后续情报分析效率的基础。然而通用的自动摘要算法往往会忽略很多航天领域关键信息,且有监督自动摘要算法需要对领域文本进行大量的数据标注,费时费力。因此,提出一种基于领域概念图的无监督自动摘要(DCG-TextRank)模型,利用领域术语辅助引导图排序,提高模型对领域文本的理解力。该模型分3个模块:领域概念图生成、图权重初始化、图排序及语义筛选。根据句向量相似度和领域术语库,将文本转换为包含句子节点和领域术语节点的领域概念图;根据航天新闻文本特征初始化领域概念图权值;采用TextRank模型对句子进行排序,并在语义筛选模块通过图节点聚类及设置摘要语义保留度的方法改进TextRank的输出,充分保留文本的多语义信息并降低冗余。所提模型具有领域可移植性,且实验结果表明:在航天新闻数据集中,所提模型相比传统TextRank模型性能提升了14.97%,相比有监督抽取式文本摘要模型BertSum和MatchSum性能提升了4.37%~12.97%。The effectiveness of subsequent intelligence analysis can be increased by comprehending and compressing the vast amount of aerospace information that is hidden in the Internet's aerospace news.However the general automatic summarization algorithms tend to ignore many domain key Information,and the existing supervised automatic summarization algorithms need to annotate a lot of data in the domain text.It is time-consuming and laborious.Therefore,we proposed an unsupervised automatic summarization model TextRank based on domain concept graph(DCG-TextRank).It is based on a domain concept graph,which uses domain terms to help guide graph ordering and improve the model's understanding of domain text.The model has three modules:domain concept graph generation,graph weight initialization,graph sorting and semantic filtering.Transform the text into domain concept graph containing sentence nodes and domain term nodes according to sentence vector similarity and domain term database.Initialize the domain concept graph weight according to the features of aerospace news text.Use the TextRank algorithm to sort the sentences,and in the semantic filtering module,the output of TextRank is improved by clustering the graph nodes and setting the semantic retention of the abstract,which fully preserves the semantic Information of text and reduces redundancy.The proposed model is domain portable,and experimental findings indicate that in the aerospace news dataset,the proposed model performs 14.97%better than the conventional TextRank model and 4.37%~12.97%better than the supervised extraction text summary models BertSum and MatchSum.
关 键 词:自动文本摘要 领域概念图 预训练语言模型 图排序算法 图节点聚类
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.15.26.71