检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《小型微型计算机系统》2015年第10期2213-2217,共5页Journal of Chinese Computer Systems
基 金:国家自然科学基金重点项目(41231171)资助
摘 要:数据分类是数据挖掘领域中一类重要的问题,然而,当前的数据挖掘工作面临着大样本量、高维度数据的挑战.从数据特征中选择出有效的数据特征子集,能够使数据降维,是进行进一步数据分类的基础.目前比较流行的特征选择方法对高维数据不太适应,精度也不高.因此,提出一种基于t检验和弹性网的特征选择方法,其基本思想是通过t检验得到特征在不同类之间的差异程度,并利用弹性网回归模型对差异程度较大的特征进行分析,通过回归系数压缩和误分类率得到最终的特征子集.本文通过实验证实了此方法在准确性、稳定性及时间代价上都具有良好的效果.Data classification is an important issue in data mining domain. However,data mining is currently faced with challenges of large-sized and high-dimensional data. It is the basis of further data classification that effective feature subset being selected and thus data dimension being reduced. Currently popular feature selection methods are not accustomed to high-dimensional data and its accuracy is not good enough. In the present paper,a method based on t-test and elastic net is proposed,which is specially for data classification problems. In this method, variances of features between classes is calculated by t-tests. Then the features which have bigger variances are analyzed through the elastic net regression model. Finally, the feature subset is selected by shrinkage of regression coefficients and misclassification error rate. Experiments show that the method has achieved good results in aspects of accuracy, stability and time costs.
分 类 号:TP311[自动化与计算机技术—计算机软件与理论]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:3.144.200.28