检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周钢 郭福亮[1] ZHOU Gang;GUO Fu-liang(Naval University of Engineering,Wuhan 430033,China)
机构地区:[1]海军工程大学,武汉430033
出 处:《计算机科学》2021年第S01期250-254,共5页Computer Science
摘 要:从集成学习的预测误差分析和偏差-方差分解可以发现使用有限的、具有正确率和差异性的基学习器进行集成学习,具有更好的泛化精度。利用信息熵构建了两阶段的特征选择集成学习方法,第一阶段先按照相对分类信息熵构建精度高于0.5的基特征集B;第二阶段先在B的基础上按互信息熵标准评判独立性,运用贪心算法构建独立的特征子集,再运用Jaccard系数评价特征子集间多样性,选取多样性的独立特征子集并构建基学习器。通过数据实验分析发现,该优化方法的执行效率和测试精度优于普通Bagging方法,在多分类的高维数据集上优化效果更好,但不适用于二分类问题。From the prediction error analysis and deviation-variance decomposition of ensemble learning,it can be found that the use of limited,accurate and differentiated basic learners for ensemble learning has better generalization accuracy.A two-stage feature selection ensemble learning method is constructed by using information entropy.In the first stage,the basic feature set B with accuracy higher than 0.5 is constructed according to the relative classification information entropy.In the second stage,independent feature subset is constructed by greedy algorithm and mutual information entropy criterion on the basis of B.Then Jaccard coefficient is used to evaluate the diversity among feature subsets,and the independent feature subset of diversity is selected and the basic learner is constructed.Through the analysis of data experiments,it is found that the efficiency and accuracy of the optimization method are better than the general Bagging method,especially in multi-classification high-dimensional datasets,the optimization effect is good,but it is not suitable for the two-classification problem.
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.30