机构地区:[1]合肥工业大学管理学院,合肥230009 [2]过程优化与智能决策教育部重点实验室,合肥230009 [3]湖州师范学院商学院,浙江湖州313000 [4]安徽财经大学管理科学与工程学院,安徽蚌埠233030
出 处:《计算机学报》2019年第6期1252-1273,共22页Chinese Journal of Computers
基 金:国家自然科学基金(91546108,71490725);国家重点研发计划项目(2016YFF0202604);安徽省自然科学基金(1908085QG298,1708085MG169);安徽省教育厅人文社科项目(JS2017AJRW0135);湖州市科技计划自然科学基金项目(2018YZ11);过程优化与智能决策教育部重点实验室开放课题资助~~
摘 要:集成剪枝是提高分类器集成性能的一种关键性技术,其通过选择较小规模的基分类器,获得更优的集成性能.目前集成剪枝方法通常单独采用基分类器间的差异性测度或元启发式算法,进行集成剪枝.基分类器的平均精度和差异性被广泛认为是集成剪枝的两个重要指标,但增大基分类器间差异性势必会减小其平均分类精度,提高基分类器的平均精度亦会降低其差异性.故在基分类器的平均精度和差异性之间存在一个平衡状态,使得集成性能最优,找到该平衡状态才是集成剪枝成功的关键.集成剪枝是一个NP完全问题,采用差异性测度仅能剔除集成系统中部分冗余的基分类器,难以准确地找到该平衡状态;元启发式算法在搜索该平衡状态上,具有良好的性能,但若单独采用元启发式算法,则很难穷尽搜索到该平衡状态.故该文提出了融合改进二元萤火虫算法和边界最小化测度的集成剪枝方法.首先,采用Bootstrap方法重复抽取训练集,获得多个训练子集,使用分类器分别进行独立训练,获得多个基分类器;其次,运用边界最小化测度对所获得的基分类器进行预剪枝,剔除综合性能较差的基分类器,显著降低集成剪枝问题的复杂度;接着,通过改进萤火虫的移动方式和搜索过程,并引入竞争行为和跳跃行为,提出了改进二元萤火虫算法;最后,利用改进二元萤火虫算法对预剪枝后的基分类器,进行二次剪枝,选择出性能最优的子集成.在35个UCI标准数据集上进行测试,实验结果表明:相较于其他方法,该文所选择的基分类器规模较小,集成分类精度更高,并验证了其有效性和显著性.Ensemble pruning is a key technique for a goal of achieving a better ensemble performance, using a smaller ensemble size of base classifiers, via finding a optimal sub-ensemble. Existing ensemble pruning approaches always find the optimal sub-ensemble using diversity measures among base classifiers or running heuristic searching algorithms, separately. Diversity and accuracy of base classifiers are widely recognized as two important properties for a successful ensemble, but the increasing of the diversity of base classifiers must lead to the decrease of the average accuracy of the whole base classifiers in a constructed initial pool of classifiers, and improving the average accuracy of the whole base classifiers in the constructed initial pool of classifiers must reduce the diversity among base classifiers. Therefore, there is a tradeoff between the diversity and the accuracy of base classifiers in the constructed initial pool of classifiers, which makes the ensemble perform at its best. Finding the tradeoff between the diversity and the average accuracy of base classifiers is the key to a successful ensemble. Ensemble pruning is an NP-complete problem. Those ensemble pruning approaches based on diversity measures, using different strategies, just prune a part of redundant classifiers, and cannot exactly find the tradeoff between the diversity and the accuracy of classifiers;Those ensemble pruning methods based on heuristic searching algorithms can achieve good results when it comes to finding the tradeoff between the diversity and the accuracy, but it is hard to do an exhaustive search when the optimal sub-ensemble is achieved using heuristic algorithms. Hence, Improved Binary Glowworm Swarm Optimization combined with Margin distance minimization measure for Ensemble Pruning (IBGSOMEP) is proposed using a combination of the proposed improved binary glowworm swarm optimization and margin distance minimization measure. Firstly, a set of training subsets are obtained using the bootstrap sampling method, and a colle
关 键 词:萤火虫算法 二元离散化 边界最小化 集成剪枝 差异性
分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...