数据集分类可用性评估的置信区间方法  被引量:8

Confidence Interval Method for Classification Usability Evaluation of Data Sets

在线阅读下载全文

作  者:谈询滔 顾依依 阮彤[1] 袁玉波[1] TAN Xun-tao;GU Yi-yi;RUAN Tong;YUAN Yu-bo(Department of Computer Science and Engineering,East China University of Science and Technology,Shanghai 200237,China)

机构地区:[1]华东理工大学计算机科学与工程系,上海200237

出  处:《计算机科学》2019年第1期78-85,共8页Computer Science

基  金:国家自然科学基金项目(61772201);上海市科委基金项目(16511101000);上海市科委基金项目(17DZ11011003)资助

摘  要:如何有效评价训练数据集的可用性,一直是困扰智能分类系统应用的难点问题。针对机器学习领域的数据分类问题,提出了一种基于区间分析和信息粒化的数据集分类可用性的评估方法,用于评价数据集的可分程度。该方法将待评估的数据集定义为分类信息系统,提出了分类置信区间的概念,通过区间分析进行信息粒化。在此信息粒化策略下,定义分类可用性的数学模型,并进一步给出单个属性以及整体数据集的分类可用性的计算方法。选择18个UCI标准数据集作为评估对象,给出了部分数据集分类可用性的评估结果,并且选取3种分类器对所选数据集进行分类实验,最终通过对上述实验结果的分析证明了该评估方法的有效性和可行性。It is always a difficult problem to evaluate the usability of training data sets effectively,which hinders the application of intelligent classification systems.Aiming at the issue of data classification in the field of machine learning,based on interval analysis and information granulation,this paper proposed an evaluation method of data classification usability to measure the separability of data sets.In this method,dataset is defined as the classification information system,and the concept of classification confidence interval is put forward,then the information granulation is carried out by interval analysis.Under this information granulation strategy,this paper defined the mathematical model of classification usability,and further gave the calculation method of the classification usability for single attribute and the total data set.In this paper,18 UCI standard data sets were selected as evaluation objects,the evaluation results of classification usability were given,and 3 classifiers were selected to classify the above data sets.Finally,the effectiveness and feasibility of this evaluation method are verified by the analysis of experimental results.

关 键 词:数据可用性 分类系统 区间分析 信息粒化 分类可用性 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象