基于MapReduce的多级特征选择机制  被引量:9

Multi-level Feature Selection Mechanism Based on MapReduce

在线阅读下载全文

作  者:宋哲理 王超 王振飞[3] SONG Zhe-li;WANG Chao;WANG Zhen-fei(Zhengzhou Vocational College of Finance and Taxation,Zhengzhou 450048 ,China;The 713th Research Institute of China Shipbuilding Industry Corporation,Zhengzhou 450015,China;School of Information Engineering,Zhengzhou University,Zhengzhou 450001,China)

机构地区:[1]郑州财税金融职业学院,郑州450048 [2]中国船舶重工集团公司第七一三研究所,郑州450015 [3]郑州大学信息工程学院,郑州450001

出  处:《计算机科学》2018年第B11期468-473,479,共7页Computer Science

基  金:国家自然科学基金项目(61379079)资助

摘  要:特征选择是文本分类的关键步骤,分类结果的准确度主要取决于选择得到的特征词的优劣。文中提出一种基于MapReduce的多级特征选择机制,一方面利用改进的CHI特征选择算法进行初次筛选,再通过互信息方法对初选结果进行噪声词过滤、优质特征词前置等操作;另一方面将本机制载入MapReduce模型中,以减少多级特征选择作用于海量数据的时间消耗。实验结果表明,该机制能在较短的时间内处理大规模数据,同时也提升了文本分类的精度。Feature selection is a committed step of text classification.The classification accuracy mainly depends on the merits and demerits of the selected feature words.This paper proposed a multi-level feature selection mechanism based on MapReduce.On the one hand,the mechanism screens the original dataset by an improved CHI feature selection algorithm,then uses the mutual information method to filter the noise words and to put the high quality feature words forward for the primaries.On the other hand,the time consumption of multi-level feature selection is reduced by introducing the mechanism into MapReduce model.Experimental results show that the mechanism improves both the classification accuracy and its runtime when dealing with big data problems.

关 键 词:文本分类 特征选择 CHI 互信息 MAPREDUCE 

分 类 号:TP301[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象