基于M-C-G神经网络的多文档自动摘要方法  被引量:1

Multi-document Summarization Based on M-C-G Neural Network

在线阅读下载全文

作  者:高阳 闫仁武 袁双双 GAO Yang;YAN Ren-wu;YUAN Shuang-shuang(School of Computer,Jiangsu University of Science and Technology,Zhenjiang 212000,China;School of Software,Southeast University,Suzhou 215000,China)

机构地区:[1]江苏科技大学计算机学院,江苏镇江212000 [2]东南大学软件学院,江苏苏州215000

出  处:《软件导刊》2020年第10期39-45,共7页Software Guide

基  金:国家自然科学基金项目(61772244)。

摘  要:为解决海量数据导致用户信息过载问题,通过分析人民网、新浪网等网站的新闻网页数据,将传统方法与深度学习方法相结合,提出基于特征融合、CNN和GRU的多文档摘要方法(M-C-G)。首先对30种不同主题的新闻网页进行数据清洗,使用word2vec工具训练词向量模型,根据多种特征计算得到初步摘要结果;然后把8.3万条搜狐新闻语料文本数据输入带有CNN和GRU的Seq2Seq模型上训练;最后把初步摘要结果输入模型进行测试,得到最终摘要结果。实验结果表明,在ROUGE评测体系中采用该方法比现有方法准确率至少提高约2%,可有效帮助用户寻找有价值的文本信息。In order to solve the problem of user information overload caused by massive data,this paper analyzed news data from news websites such as People’s Daily and Sina.com.After combining traditional methods and deep learning methods,multi-document summarization method(named M-C-G)based on multi-feature fusion,convolutional neural networks(CNN)and gated recurrent unit(GRU)was proposed.First,the news text data of 30 different topics was cleaned.And the word2vec was used to train the word vector model.The preliminary summary results were calculated based on various characteristics.Then 83,000 SOHU news text data were used into a Seq2Seq model with CNN and GRU for training.Finally,the final summary results were obtained based on the preliminary summary results which were tested into the model.Experimental results show that the method in the ROUGE evaluation system is at least about 2%higher than existing methods and can effectively help users find valuable text information.

关 键 词:特征融合 深度学习 Seq2Seq CNN GRU 文本摘要 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象