离散化方案的度量  被引量:1

Measurements of Discretization Schemes

在线阅读下载全文

作  者:王立宏[1] 吴耿锋[2] 

机构地区:[1]烟台大学计算机科学与技术学院,烟台264005 [2]上海大学计算机工程与技术学院,上海200072

出  处:《模式识别与人工智能》2008年第4期494-499,共6页Pattern Recognition and Artificial Intelligence

基  金:国家自然科学基金(No.60772028);山东省自然科学基金(No.Y2006G22)资助项目

摘  要:分析数值决策表离散化方案的度量指标,包括断点数、条件信息熵、粒度熵、类-属性互信息、类-属性互相依赖冗余等.认为相容决策表的条件信息熵和类-属性互信息都是常数,对离散化方案不再有指导作用.讨论粒度熵与互相依赖冗余的关系,证明粒度熵随断点的加入而增加.设计实验度量这些指标之间的关系,实验发现,断点数和粒度熵与预测精度之间的相关程度不相上下,和具体的数据集有关.Several measurements of the discretization schemes for continuous decision tables are discussed, including cut-point number, conditional entropy, granular entropy, class-attribute mutual information and interdependence redundancy. For consistent decision table, conditional entropy and class-attribute mutual information are both constants, and thus they can not offer more information for discretization schemes. The relationship between granular entropy and interdependence redundancy is analyzed. And it is proved that granular entropy increases when new cut points are added to the discretization scheme. A hybrid discretization algorithm is proposed to provide discretization schemes for testing. The simulation results show that the correlation coefficient between the cut-point number and classification accuracy is basically equal to that between granular entropy and classification accuracy, and both of them are correlated to datasets.

关 键 词:粒度熵 离散化方案 断点 分类精度 粗集 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象