基于统计学法则的连续属性值划分方法  

Partition Approach for Continuous Attributes Based on Statistical Criterion

在线阅读下载全文

作  者:高洪涛[1] 陆伟[2] 杨余旺[2] GAO Hong-tao;LU Wei;YANG Yu-wang(Department of Cyber Crime Investigation,Criminal Investigation Police University of China 1,Shenyang 110035,China;School of Computer Science and Engineering,Nanjing University of Science and Technology2,Nanjing 210094,China)

机构地区:[1]中国刑事警察学院网络犯罪侦查系,沈阳110035 [2]南京理工大学计算机科学与工程学院,南京210094

出  处:《科学技术与工程》2018年第16期237-240,共4页Science Technology and Engineering

基  金:国家自然科学基金(61640020)资助

摘  要:目前决策树中很多分类算法例如ID3/C4.5/C5.0等都依赖于离散的属性值,并且希望将它们的值域划分到一个有限区间。利用统计学法则,提出一种新的连续属性值的划分方法;该方法通过统计学法则来发现精准的合并区间。另外在此基础上,为提高决策树算法分类学习性能,提出一种启发式的划分算法来获得理想的划分结果.在UCI真实数据集上进行仿真实验.结果表明获得了一个比较高的分类学习精度、与常见的划分算法比较起来有很好的分类学习能力。Many classification algorithms such as ID3/C4. 5/C5. 0 decision tree algorithms rely on discrete attributes and need to quantify continuous attributes into a finite number of intervals. A new data partition method for continuous attributes was presented. This approach used a statistical criterion to discover the accurate discrete intervals which was required to merge. In order to promote the classification performance of decision tree algorithm,a heuristic algorithm was also discussed to gain excellent the quantify results. A serials of simulation had been done using UCI data sets. The experiments results and performance analysis show approach is a good partition model,C4. 5 decision tree classification algorithm can benefit a lot from our method.

关 键 词:连续属性值 学习精度 统计学法则 分类算法 

分 类 号:TP393.03[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象