检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:刘成锴 王斌君 吴勇 LIU Cheng-kai;WANG Bin-jun;WU Yong(College of Information Technology and Network Security,People's Public Security University of China,Beijing 100038,China)
出 处:《科学技术与工程》2019年第33期302-307,共6页Science Technology and Engineering
摘 要:文本特征选择是自然语言处理中的关键问题。针对文本特征的高维性和稀疏性问题,在过滤式特征选择算法文档-逆文档评率(term frequency-inverse document frequency,TF-IDF)的基础上,提出了用遗传算法对文本特征进行优化选择,使其最大程度地贴合后续的文本分类算法,在保证文本分类精确度的同时,降低特征维度以缩减预测时间。实验显示,该算法与单一的过滤式文本特征选择算法相比,能够有效减少所选文本特征数量(即降低特征维度),能有效提高文本的分类能力。Text feature selection is a key issue in natural language processing.Due to the high-dimensional and sparsity of text features,based on the filter feature selection algorithm term frequency-inverse document frequency(TF-IDF),the genetic algorithm was used to optimize the text features.To maximize the fit of the subsequent text classification algorithm,while not effecting the accuracy of the text classification,reduce the feature dimension to reduce the prediction time.Experiments show that compared with a single filtered text feature selection algorithm,the algorithm can effectively reduce the number of selected text features(reduce the feature dimension)and effectively improve the text classification ability.
分 类 号:TP391.14[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.200