检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张昱 刘开峰[1] 张全新 王艳歌[1] 高凯龙 ZHANG Yu;LIU Kai-feng;ZHANG Quan-xin;WANG Yan-ge;GAO Kai-long(School of Electrical and Information Engineering & Beijing Key Laboratory of Intelligent Processing for Building Big Data,Beijing University of Civil Engineering and Architecture,Beijing 100044,China;State Key Laboratory in China for Geo Mechanics and Deep Underground Engineering,China University of Mining & Technology,Beijing 100083,China;School of Computer Science and Technology,Beijing Institute of Technology,Beijing 100081,China)
机构地区:[1]北京建筑大学电气与信息工程学院&建筑大数据智能处理方法研究北京市重点实验室,北京100044 [2]中国矿业大学深部岩土力学与地下工程国家重点实验室,北京100083 [3]北京理工大学计算机科学与技术学院,北京100081
出 处:《电子学报》2021年第6期1059-1067,共9页Acta Electronica Sinica
基 金:北京建筑大学优秀主讲教师培育计划(No.21082718041);国家重点研发计划(No.2016YFC0600901);教育部2018产学合作协同育人项目(No.201801113001);市属高校基本科研业务费(No.30850919027);北京建筑大学研究生创新项目(No.PG2020051)。
摘 要:目前的新闻分类研究以英文居多,而且常用的传统机器学习方法在长文本处理方面,存在局部文本块特征提取不完善的问题.为了解决中文新闻分类缺乏专门术语集的问题,采用构造数据索引的方法,制作了适合中文新闻分类的词汇表,并结合word2vec预训练词向量进行文本特征构建.为了解决特征提取不完善的问题,通过改进经典卷积神经网络模型结构,研究不同的卷积和池化操作对分类结果的影响.为提高新闻文本分类的精确率,本文提出并实现了一种组合-卷积神经网络模型,设计了有效的模型正则化和优化方法.实验结果表明,组合-卷积神经网络模型对中文新闻文本分类的精确率达到93.69%,相比最优的传统机器学习方法和经典卷积神经网络模型精确率分别提升6.34%和1.19%,并在召回率和F值两项指标上均优于对比模型.At present,most of the researches on news classification are in English,and the traditional machine learning methods have a problem of incomplete extraction of local text block features in long text processing.In order to solve the problem of lack of special term set for Chinese news classification,a vocabulary suitable for Chinese text classification is made by constructing a data index method,and the text feature construction is combined with word2vec pre-trained word vector.In order to solve the problem of incomplete feature extraction,the effects of different convolution and pooling operations on the classification results are studied by improving the structure of classical convolution neural network model.In order to improve the precision of Chinese news text classification,this paper proposes and implements a combined-convolution neural network model,and designs an effective method of model regularization and optimization.The experimental results show that the precision of the combined-convolutional neural network model for Chinese news text classification reaches 93.69%,which is 6.34%and 1.19%higher than the best traditional machine learning method and classic convolutional neural network model,and it is better than the comparison model in recall and F-measure.
关 键 词:自然语言处理 词向量 组合-卷积神经网络 中文新闻 文本分类
分 类 号:TP183[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.166