检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:琚春华[1,2] 邹江波[1] 魏建良[1] 张华[1]
机构地区:[1]浙江工商大学计算机与信息工程学院,浙江杭州310018 [2]浙江工商大学现代商贸研究中心,浙江杭州310018
出 处:《管理工程学报》2013年第4期119-125,共7页Journal of Industrial Engineering and Engineering Management
基 金:国家自然科学基金资助项目(71071141);国家教育部博士点基金资助项目(20103326110001)
摘 要:集成分类器已被广泛应用于数据流分类模型以此削弱概念漂移的影响。通常当基分类器的分类准确率低于特定的阈值时,集成分类器开始学习代替分类准确率低的分类器,以此来克服概念漂移的影响。但仅当基分类器的错误率低于阈值时才开始学习会使集成分类器对当前概念的判断产生一定滞后性,所以本文在集成分类器的基础上,融入了情景特征的分析,采用信息增益的方法提取情景特征,通过动态设置情景特征的阈值来提前预测概念漂移的发生。当情景特征的变化超出情景阈值时,立即通知集成分类器重新学习产生新的基分类器,而不是等到基分类器的准确率低于集成分类器的阈值时才开始学习,这样便使集成分类器具有了一定的前馈性。通过对特定数据的实验分析,证明了本文提出的OCEC(Origin Characteristics Ensemble Classifier)模型降低了挖掘概念漂移数据流时的集成泛化误差,提高了检测概念漂移的有效性。Data mining techniques have been applied in many fundamental research domains such as retailing, stock market,telecommunications industry, and medicine. Data stream generated from the data in these industries are not stable and they change all the time. Moreover, these changes are unpredictable trigger dynamicity of target concepts which are generally known as concept drifts in the literature. Still, the relationships between hidden context and concepts are not clear. Modeling data flow which contains concept drift is one of core problems in the data mining field because the changeable target concept will reduce the accuracy of the model, and require that the corresponding decision model be revised to process the current inputted data. The models and algorithms used in the existing literature can be categorized into three groups: ( 1 ) instance-based selection learning method, (2) instance-based weighting learning method, and ( 3 ) ensemble classification learning method ( or learning with multiple concept descriptions). The base classifiers are used to reflect the current concept, and predict different classes of samples by integrating all the classification results. Ensemble classifier has been widely used to weaken the impact of concept drift on data stream classification models. When the predictive accuracy of one base classifier in these models is below the given threshold, the ensemble classifier begins to learn a new base classifier and replaces the old one to overcome the influence from the concept drift. However, the ensemble classifier starts to learn only when the accuracy of the base classifier is lower than the threshold. As a result, this may cause a certain lag from the identification of the current concepts for the ensemble classifier. This paper proposes a new method which adds scenario characteristics analysis to the ensemble classifier and adopts the information gain method to extract scenario characteristics. In addition, the threshold of the scenario characteristic is set
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229