结合自助抽样的动态数据流贝叶斯分类算法  被引量:3

Bayesian classification algorithm of dynamic data stream based on bootstrap

在线阅读下载全文

作  者:琚春华[1] 殷贤君[1] 许翀寰[1] 

机构地区:[1]浙江工商大学计算机与信息工程学院,杭州310018

出  处:《计算机工程与应用》2011年第8期118-121,142,共5页Computer Engineering and Applications

基  金:国家自然科学基金(No.70671094);浙江科技计划项目(No.2008C14061);浙江省自然科学基金重点项目(No.Z1091224);浙江省自然科学基金项目(No.Y1090617)~~

摘  要:动态数据流具有数据量大、变化快、随机存取代价高、详细数据难以存储等特点,挖掘动态数据流对计算能力与存储能力要求非常高。针对动态数据流的以上特点,设计了一种基于自助抽样的动态数据流贝叶斯分类算法,算法运用滑动窗口模型对动态数据流进行处理分析。该模型以每个窗口的数据为基本单位,对窗口内的数据进行处理分析;算法采用自助抽样技术对待分类数据中的属性进行裁剪和优化,解决了数据属性间的多重线性相关问题;算法结合贝叶斯算法的特点,采用动态增量存储树来解决动态样本数据流的存储问题,实现了无限动态数据流无信息失真的静态有限存储,解决了动态数据流挖掘最大的难题——数据存储;对优化的待分类数据使用all-贝叶斯分类器和k-贝叶斯分类器进行分类,结合数据流的特性对两个分类器进行实时更新。该算法有效克服了贝叶斯分类属性独立性的约束和传统贝叶斯只对静态数据分类的缺点,克服了动态数据流最大的难题——数据存储问题。通过实验测试证明,基于自助抽样的贝叶斯分类具有很高的时效性和精确性。Dynamic data streams have features of large data,instant change,costly random access and difficult storage of detailed data,so mining of such dynamic data streams puts forwards high requirements on the computing power and storage capacity.According to the above features,a Bayesian classification algorithm of dynamic data stream based on bootstrap is proposed to process and analyze dynamic data streams with the sliding window model.This model,taking data of each window as the basic unit,processes and analyzes the data of windows.The algorithm adopts the bootstrap method to cut and optimize the attributes of data to be classified,solving the problem in multi-linear inter-relation between data attributes.The algorithm,combining characteristics of Bayesian algorithm,adopts the dynamic incremental storage tree to store the dynamic sample data stream to realize the static finite storage of infinite dynamic data streams without distortion of information and ultimately solve the biggest problem in dynamic data stream mining——data storage.The all-Bayesian classifier and k-Bayesian classifier are adopted to classify the optimized data,and their updates are made according to the features of data streams.This algorithm overcomes the attribute independence of the Bayesian classifier and its limitation only to the static data.It overcomes the biggest problem of dynamic data stream——the data storage.Experimental tests prove that the Bayesian classification algorithm based on bootstrap has high timeliness and accuracy.

关 键 词:数据流 自助抽样 贝叶斯分类 滑动窗口 增量存储树 

分 类 号:TP311.13[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象