基于领域概念图的航天新闻自动摘要模型  被引量:1

Automatic summarization model of aerospace news based on domain concept graph

在线阅读下载全文

作  者:黄浩宁 陈志敏 徐聪 张晓燕[3] HUANG Haoning;CHEN Zhimin;XU Cong;ZHANG Xiaoyan(National Space Science Center,Chinese Academy of Sciences,Beijing 100190,China;University of Chinese Academy of Sciences,Beijing 100049,China;State Radio Monitoring Center,Beijing 100037,China)

机构地区:[1]中国科学院国家空间科学中心,北京100190 [2]中国科学院大学,北京100049 [3]国家无线电监测中心,北京100037

出  处:《北京航空航天大学学报》2024年第1期317-327,共11页Journal of Beijing University of Aeronautics and Astronautics

基  金:国家自然科学基金(91738101);国家重点研发计划(2020YFB1807900)。

摘  要:互联网海量的航天新闻中隐含着大量航天情报信息,对其进行理解与压缩是提高后续情报分析效率的基础。然而通用的自动摘要算法往往会忽略很多航天领域关键信息,且有监督自动摘要算法需要对领域文本进行大量的数据标注,费时费力。因此,提出一种基于领域概念图的无监督自动摘要(DCG-TextRank)模型,利用领域术语辅助引导图排序,提高模型对领域文本的理解力。该模型分3个模块:领域概念图生成、图权重初始化、图排序及语义筛选。根据句向量相似度和领域术语库,将文本转换为包含句子节点和领域术语节点的领域概念图;根据航天新闻文本特征初始化领域概念图权值;采用TextRank模型对句子进行排序,并在语义筛选模块通过图节点聚类及设置摘要语义保留度的方法改进TextRank的输出,充分保留文本的多语义信息并降低冗余。所提模型具有领域可移植性,且实验结果表明:在航天新闻数据集中,所提模型相比传统TextRank模型性能提升了14.97%,相比有监督抽取式文本摘要模型BertSum和MatchSum性能提升了4.37%~12.97%。The effectiveness of subsequent intelligence analysis can be increased by comprehending and compressing the vast amount of aerospace information that is hidden in the Internet's aerospace news.However the general automatic summarization algorithms tend to ignore many domain key Information,and the existing supervised automatic summarization algorithms need to annotate a lot of data in the domain text.It is time-consuming and laborious.Therefore,we proposed an unsupervised automatic summarization model TextRank based on domain concept graph(DCG-TextRank).It is based on a domain concept graph,which uses domain terms to help guide graph ordering and improve the model's understanding of domain text.The model has three modules:domain concept graph generation,graph weight initialization,graph sorting and semantic filtering.Transform the text into domain concept graph containing sentence nodes and domain term nodes according to sentence vector similarity and domain term database.Initialize the domain concept graph weight according to the features of aerospace news text.Use the TextRank algorithm to sort the sentences,and in the semantic filtering module,the output of TextRank is improved by clustering the graph nodes and setting the semantic retention of the abstract,which fully preserves the semantic Information of text and reduces redundancy.The proposed model is domain portable,and experimental findings indicate that in the aerospace news dataset,the proposed model performs 14.97%better than the conventional TextRank model and 4.37%~12.97%better than the supervised extraction text summary models BertSum and MatchSum.

关 键 词:自动文本摘要 领域概念图 预训练语言模型 图排序算法 图节点聚类 

分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象