用于文本分类的改进KNN算法  被引量:6

Improved KNN algorithm applied to text categorization

在线阅读下载全文

作  者:王煜[1] 张明[1] 王正欧[2] 白石 

机构地区:[1]河北大学数学与计算机学院,河北保定071002 [2]天津大学系统工程研究所,天津300072 [3]沧州市城建档案馆,河北沧州061000

出  处:《计算机工程与应用》2007年第13期159-162,166,共5页Computer Engineering and Applications

基  金:国家自然科学基金(the National Natural Science Foundation of China under Grant No.60275020)。

摘  要:采用灵敏度方法对距离公式中文本特征的权重进行修正;提出一种基于CURE算法和Tabu算法的训练样本库的裁减方法,采用CURE聚类算法获得每个聚类的代表样本组成新的训练样本集合,然后用Tabu算法对此样本集合进行进一步维护(添加或删除样本),添加样本时只考虑增加不同类交界处的样本,添加或删除样本以分类精度最高、与原始训练样本库距离最近为原则。In this paper,based on the neural network theory,weights of features are adjusted firstly by using sensitivity method.A method is presented to prune training samples for KNN algorithm.First,representative samples set of training sets are acquired based on CRUE clustering algorithm,The representative samples set is taken as the initial set of Tabu algorithm to further maintain.The method only considers the samples at different classes borders when samples are insert into new training set.The principles of deleting or inserting a sample are the higher categorization accuracy principle and the higher similarity with training set principle.The work of pruning and maintenance training samples set is decreased largely.Both satisfied speed and accuracy of classification can be acquired.

关 键 词:文本分类 KNN算法 灵敏度法 CURE聚类算法 TABU算法 

分 类 号:TP311[自动化与计算机技术—计算机软件与理论]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象