一种基于CHI值特征选取的粗糙集文本分类规则抽取方法  被引量:8

Rough set text classification rule extraction based on CHI value

在线阅读下载全文

作  者:王明春[1] 王正欧[1] 张楷[2] 郝玺龙 

机构地区:[1]天津大学系统工程研究所,天津300072 [2]天津工程师范学院数理系,天津300222 [3]天津海量软件公司,天津300384

出  处:《计算机应用》2005年第5期1026-1028,1033,共4页journal of Computer Applications

基  金:国家自然科学基金资助项目(60275020)

摘  要:结合文本分类规则抽取的特点,给出了近似规则的定义。该方法首先利用CHI值进行特征选取并为下一步特征选取提供特征重要性信息,然后使用粗糙集对离散决策表继续进行特征选取,最后用粗糙集抽取出精确规则或近似规则。该方法将CHI值特征选取和粗糙集理论充分结合,避免了用粗糙集对大规模决策表进行特征约简,同时避免了决策表的离散化。该方法提高了文本规则抽取的效率,并使其更趋实用化。实验结果表明了这种方法的有效性和实用性。The definition of proximate rule was proposed based on the characteristic of text classification rule extraction. Based on the CHI values, the features of text set were selected firstly and feature significance information was provided to the further feature selection. Then rough set was used to select further the attributes on the discrete decision table. Finally precise rules or proximate rules were extracted using rough set theory. The method combined CHI value feature selection and rough set theory fully so as to avoid both feature reduction on a large scale decision table and the discretization of the decision table. The method improved the effectiveness and the practicability of extracting text rule greatly. Experiment results demonstrate the effectiveness of the method.

关 键 词:CHI值 特征选取 粗糙集 文本分类规则 

分 类 号:TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象