融合主题特征的文本自动摘要方法研究  被引量:5

Research on automatic text summarization combining topic feature

在线阅读下载全文

作  者:罗芳[1] 汪竞航 何道森 蒲秋梅[3] Luo Fang;Wang Jinghang;He Daosen;Pu Qiumei(School of Computer Science&Technology,Wuhan University of Technology,Wuhan 430063,China;Dept.of Supply Chain&Information Management,Hang Seng University of Hong Kong,Hong Kong 999077,China;School of Information Engineering,Minzu University of China,Beijing 100081,China)

机构地区:[1]武汉理工大学计算机科学与技术学院,武汉430063 [2]香港恒生大学供应链及资讯管理系,香港999077 [3]中央民族大学信息工程学院,北京100081

出  处:《计算机应用研究》2021年第1期129-133,共5页Application Research of Computers

基  金:国家教育部人文社会科学研究规划基金资助项目(18YJAZH087);武汉理工大学自主创新研究基金资助项目(3120600100)。

摘  要:针对传统图模型方法进行文本摘要时只考虑统计特征或浅层次语义特征,缺乏对深层次主题语义特征的挖掘与利用,提出了融合主题特征后多维度度量的文本自动摘要方法MDSR(multi-dimension summarization rank)。首先利用LDA主题模型对文本主题语义信息进行挖掘,定义了主题重要度以衡量主题特征对句子重要程度的影响;然后结合主题特征、统计特征和句间相似度,改进了图模型节点的概率转移矩阵的构建方式;最后根据句子节点权重进行摘要的抽取与度量。实验结果显示,当主题特征、统计特征及句间相似度权重比例达到3:4:3时,MDSR方法的ROUGE评测值达到最佳,ROUGE-1、ROUGE-2、ROUGE-SU4值分别达到53.35%、35.18%和33.86%,优于对比方法,表明了融入主题特征后的文本摘要方法有效提高了摘要抽取的准确性。Aiming at the traditional graph models for text summarization only focus on statistical features or shallow semantic features,and lack mining and utilization of deep topic semantic features,this paper proposed MDSR(multi-dimension summarization rank),an automatic text summarization method that combined topic feature.Specifically,this method adopted the LDA model to mine the semantic information of text topics and measured the impact of topic feature on a sentence by defining the importance of the topic.And it improved the construction mode of the probability transition matrix of graph model nodes by combining the topic feature with statistic features and inter-sentence similarity.Finally,it extracted and measured summarization according to the weight of sentence nodes.The results show that the ROUGE value evaluates by MDSR reaches the best when the weight ratio of topic feature,statistic feature and inter-sentence similarity is 3:4:3.The ROUGE-1,ROUGE-2,ROUGE-SU4 are 53.35%,35.18%and 33.86%,which perform better than other comparisons.It shows that the text summarization method combining topic feature can effectively improve the accuracy of the summarization extraction.

关 键 词:TextRank 文本摘要 语义特征 主题模型 概率转移矩阵 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象