检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:高阳 闫仁武 袁双双 GAO Yang;YAN Ren-wu;YUAN Shuang-shuang(School of Computer,Jiangsu University of Science and Technology,Zhenjiang 212000,China;School of Software,Southeast University,Suzhou 215000,China)
机构地区:[1]江苏科技大学计算机学院,江苏镇江212000 [2]东南大学软件学院,江苏苏州215000
出 处:《软件导刊》2020年第10期39-45,共7页Software Guide
基 金:国家自然科学基金项目(61772244)。
摘 要:为解决海量数据导致用户信息过载问题,通过分析人民网、新浪网等网站的新闻网页数据,将传统方法与深度学习方法相结合,提出基于特征融合、CNN和GRU的多文档摘要方法(M-C-G)。首先对30种不同主题的新闻网页进行数据清洗,使用word2vec工具训练词向量模型,根据多种特征计算得到初步摘要结果;然后把8.3万条搜狐新闻语料文本数据输入带有CNN和GRU的Seq2Seq模型上训练;最后把初步摘要结果输入模型进行测试,得到最终摘要结果。实验结果表明,在ROUGE评测体系中采用该方法比现有方法准确率至少提高约2%,可有效帮助用户寻找有价值的文本信息。In order to solve the problem of user information overload caused by massive data,this paper analyzed news data from news websites such as People’s Daily and Sina.com.After combining traditional methods and deep learning methods,multi-document summarization method(named M-C-G)based on multi-feature fusion,convolutional neural networks(CNN)and gated recurrent unit(GRU)was proposed.First,the news text data of 30 different topics was cleaned.And the word2vec was used to train the word vector model.The preliminary summary results were calculated based on various characteristics.Then 83,000 SOHU news text data were used into a Seq2Seq model with CNN and GRU for training.Finally,the final summary results were obtained based on the preliminary summary results which were tested into the model.Experimental results show that the method in the ROUGE evaluation system is at least about 2%higher than existing methods and can effectively help users find valuable text information.
关 键 词:特征融合 深度学习 Seq2Seq CNN GRU 文本摘要
分 类 号:TP301[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.46