检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:张波 黄晓芳[1] ZHANG Bo;HUANG Xiaofang(School of Computer Science and Technology,Southwest University of Science and Technology,Mianyang 621010,Sichuan,China)
机构地区:[1]西南科技大学计算机科学与技术学院,四川绵阳621010
出 处:《西南科技大学学报》2020年第1期64-69,共6页Journal of Southwest University of Science and Technology
基 金:青年科学基金项目(61702429)。
摘 要:针对使用词语级别的预训练嵌入向量初始化卷积神经网络的嵌入层在计算资源有限时存在内存溢出和训练时间长的问题,对新闻文本作出假设:去除部分不重要的词语不会影响最终分类效果,并基于TF-IDF提出一种类别关键词提取方法。通过提取类别的关键词减少词表,进一步减小嵌入矩阵的大小。在THUCNews数据集上进行的实验表明:当嵌入矩阵参数减少近89%时,在CPU的训练时间减少约49%,模型大小减少约87%,分类性能不受影响。The training of Convolutional Neural Network(CNN)with embedded layer initialized by pre-trained words vectors is characterized by problems of Out of Memory(OOM)and longtime cost.Keen on addressing these problems,a reduction of some extraneous words without compromising the model's capability and a novel method based on TF-IDF are proposed.Key words are extracted to reduce word dictionary consequently reducing the size of embedded matrix.Experiments conducted on THUCNews dataset show when the parameters of embedded matrix are reduced by nearly 89%,the training time on the CPU is reduced by about 49%while the model size is reduced by about 87%with no adverse effect on performance of classification.
分 类 号:TP391[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38