基于二次熵的互信息特征选取方法的研究  被引量:2

The Research of Mutual Information Feature Selection Method Based on Quadratic Entroy

在线阅读下载全文

作  者:刘丽珍[1] 宋瀚涛[2] 陆玉昌[3] 

机构地区:[1]首都师范大学,北京100037 [2]北京理工大学,北京100081 [3]清华大学,北京100084

出  处:《计算机科学》2004年第12期135-136,168,共3页Computer Science

基  金:973国家重点基础研究项目(G1998030414)

摘  要:随着全球网络的普及应用,大量没有统一结构和管理的在线资源急需进行处理,高效的网页自动分类方法是从网上海量信息中提取所需信息的关键技术,特征选取又是文本分类挖掘的重要基础,本文以广义信息论为理论基础.提出了基于二次熵的互信息特征选取方法,独立评估特征集中的每个特征,分析特征和类别的关系,从高维的特征空间中选取出对文本分类有效的特征,降低了文本特征空间的维数,提高了文本分类的性能。With the global prevalence of the network application, there are so many resources on line that have no uniform structures and managements. They need to be processed as quickly as possible. The method of network pages automatically classification with high efficiency is the key technology, which can abstract needed information from vast network information. For feature selection is the important foundation the field of text classification mining. We use generalized information theory as the theory base to present the method of regarding quadratic entropy mutual information (QEMI) as the feature selection. The method can value every feature that has concentrate feature, and analyze the relationships between features and classes to get good features, which help to effectively classify texts, from high dimensionality feature space, and can also decrease the dimensionality of text feature space. So it can improve the performance of text classification.

关 键 词:特征选取 文本分类 特征集 互信息 高维 网页 特征空间 取出 类方 处理 

分 类 号:TP391[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象