检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]阿坝师范高等专科学校计算机科学系,汶川623002
出 处:《科学技术与工程》2012年第34期9234-9237,9242,共5页Science Technology and Engineering
基 金:阿坝师范高等专科学校校级科研项目(ASB12-23)资助
摘 要:特征选择是文本分类的关键步骤之一,所选特征子集的优劣直接影响文本分类的结果。首先分析了词频和文档频并在此基础上对文档频进行优化。然后又以此为基础提出了特征分辨率并先用它初选文本特征。紧接着又把粗糙集引入进来并给出了一个基于等价类相关矩阵的属性约简算法,以此来进一步消除冗余特征。仿真结果表明上述方法无论是在精确度和召回率方面,还是时间性能及平均分类精度方面,都具有一定的优势。Feature selection is one of the key steps in text categorization, selected feature subset directly influ- ences results of text categorization. Firstly, word frequency and document frequency were analyzed, and an im- proved document frequency was improved. And then, feature resolution was presented based on the improved docu- ment frequency. Subsequently, rough sets were introduced into feature selection and a new attribute reduction algo- rithm based on correlation matrix of equivalence classes was provided. Finally, combining feature resolution with the provided attribute reduction algorithm, a new feature selection method was proposed. The new feature selection method firstly uses feature resolution to select text features and filter out some terms to reduce the sparsity of text feature spaces, and then employs the provided attribute reduction algorithm to eliminate redundancy. The simula- tion results show that the proposed feature selection method to a certain extent has advantages in precision rate, re- call rate, time performance and average classification accuracy.
关 键 词:特征选择 文本分类 特征分辨率 粗糙集 相关矩阵
分 类 号:TP391.43[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3