检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:郝秀兰[1,2] 陶晓鹏[1] 王述云 徐和祥[1,3] 胡运发[1]
机构地区:[1]复旦大学计算机科学与技术学院,上海200433 [2]湖州师范学院信息工程学院,湖州313000 [3]上海远程教育集团,上海200092
出 处:《模式识别与人工智能》2009年第5期709-717,共9页Pattern Recognition and Artificial Intelligence
基 金:国家自然科学基金资助项目(No.60736016)
摘 要:作为一种基于实例的方法,k-近邻(kNN)分类器有大量的计算及存储需求.同时,训练数据分布的不均衡,也会导致kNN分类器的性能下降.针对这些缺陷,文中提出特征选择与Condensing技术相结合的取样方法,以达到下述目的.在减少kNN分类的计算量及存储量的同时,保证分类器的性能.首先由传统的特征选择方法产生训练集里每类训练数据的特征.再根据文档自身的类特征,结合Condensing策略移去多余的训练实例.大量实验表明,用该方法所取得的样本作为训练集,不仅极大减少kNN方法的时空开销,而且降低噪声,提高分类器性能.As an instance based classifier, kNN has many computational and store requirements. Meanwhile, the poor performance of kNN classifier is caused by the imbalance distribution of training data. Aiming at these defects of kNN classifier, a technique, combining feature selection and condensing, is proposed to reduce the time cost and the space of kNN classifier. The proposed algorithm is divided into two steps. Firstly, several traditional methods of feature selection Then, redundant cases are removed by combination are combined to form features for each class. of class features contained in samples with Condensing algorithm. Experimental results indicate when the sample set acquired by the proposed method is used as training set, the classifier saves the time cost and the space dramatically, and the performance of the kNN classifier is improved because noisy data are removed from the training set.
关 键 词:文本分类 k-近邻(kNN) 取样 特征选择 Condensing算法
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.177