Clustering feature decision trees for semi-supervised classification from high-speed data streams  被引量:4

Clustering feature decision trees for semi-supervised classification from high-speed data streams

在线阅读下载全文

作  者:Wen-hua XU Zheng QIN Yang CHANG 

机构地区:[1]Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China [2]School of Software, Tsinghua University, Beijing 100084, China

出  处:《Journal of Zhejiang University-Science C(Computers and Electronics)》2011年第8期615-628,共14页浙江大学学报C辑(计算机与电子(英文版)

基  金:supported by the National Natural Science Foundation of China (No. 60673024);the "Eleventh Five" Preliminary Research Project of PLA (No. 102060206)

摘  要:Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data.Such approaches are impractical since labeled data are usually hard to obtain in reality.In this paper,we build a clustering feature decision tree model,CFDT,from data streams having both unlabeled and a small number of labeled examples.CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction.Micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce the any-time property.Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while gener-ating high classification accuracy with high speed.Most stream data classification algorithms apply the supervised learning strategy which requires massive labeled data. Such approaches are impractical since labeled data are usually hard to obtain in reality. In this paper, we build a clustering feature decision tree model, CFDT, from data streams having both unlabeled and a small number of labeled examples. CFDT applies a micro-clustering algorithm that scans the data only once to provide the statistical summaries of the data for incremental decision tree induction. Micro-clusters also serve as classifiers in tree leaves to improve classification accuracy and reinforce the any-time property. Our experiments on synthetic and real-world datasets show that CFDT is highly scalable for data streams while generating high classification accuracy with high speed.

关 键 词:Clustering feature vector Decision tree Semi-supervised learning Stream data classification Very fast decision tree 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象