基于粗糙集技术的压缩近邻规则  被引量:1

Condensed Nearest Neighbor Rules Based on Rough Set Technique

在线阅读下载全文

作  者:翟俊海[1] 李胜杰[1] 王熙照[1] 

机构地区:[1]河北大学数学与计算机学院河北省机器学习与计算智能重点实验室,保定071002

出  处:《计算机科学》2012年第2期236-239,共4页Computer Science

基  金:国家自然科学基金项目(60903088);河北省自然科学基金项目;河北省高校科技重点基金项目(F2010000323;ZD2010139)资助

摘  要:近邻(Nearest Neighbor,NN)算法是一种简单实用的监督分类算法。但NN算法在分类未知类标的样例时,需要存储整个训练集,还要计算该样例到训练集中每一个样例之间的距离,所以NN算法的计算复杂度非常高。为了克服这一缺点,P.Hart提出了压缩近邻(Condensed Nearest Neighbor,CNN)规则算法,即从整个训练集中找原样例集的一致子集(一致子集是能正确分类训练集中其他样例的子集)。其计算复杂度依然比较高,特别是对于大型数据库,寻找其一致子集是非常耗费时间的。针对这一问题,提出了基于粗糙集技术的压缩近邻规则算法。该算法分为3步,首先利用粗糙集方法求属性约简(特征选择),以将冗余的属性去掉。然后选取靠近边界域的样例,以将冗余的样例去掉。最后从选出的样例中计算一致子集。该算法能同时沿垂直方向和水平方法进行数据约简。实验结果显示,所提出的方法是行之有效的。Nearest neighbor(NN) algorithm is a simple practical supervised classification algorithm.When it classifies an instance without class label,the whole training set must be stored in computer memory,and its distance to each one of the training set is computed.The NN algorithm suffers from a problem:very high requirements of memory space and response time.To overcome this drawback,P.Hart proposed the condensed nearest neighbor rules algorithm,with which a consistent subset(CS) is found from the whole training set(the CS is a subset that can classify all instances in training set correctly).The complexity of the CNN is also very high.Especially,it is extremely expensive to find a CS from a large database.In order to solve this problem,a CNN algorithm based on rough set technique was proposed in this paper,which consists of three stages.Firstly,to remove the superfluous attributes an attribute reduct is computed with rough set method.Secondly,the instances within boundary regions are selected.Meanwhile the redundant instances are removed.Finally,a CS is found from the selected instances.The proposed algorithm can simultaneously reduce data in horizontal and vertical directions.The experimental results show that the proposed method is effective and efficient.

关 键 词:近邻规则 一致集 样例选择 粗糙集 边界域 

分 类 号:TP181[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象