基于代表样本动态生成的快速文本分类  

Fast Text Classification Based on Dynamical Generation of Representative Samples

在线阅读下载全文

作  者:华北[1] 曹先彬[1,2] 

机构地区:[1]中国科学技术大学计算机科学技术系 [2]安徽省计算与通讯软件重点实验室,安徽合肥230027

出  处:《计算机仿真》2007年第6期322-325,共4页Computer Simulation

基  金:国家自然科学基金(60204009);中科院复杂系统与智能科学重点实验室开放基金(20040104);973课题(2004CB318109)。

摘  要:κ-近邻作为一种简单、有效、非参数的分类方法,在文本分类中得到广泛的应用,但是这种方法计算量较大。针对κ-近邻法的不足之处,提出了一种新的快速文本分类方法,通过对原始训练样本集的训练生成代表样本,再根据原始训练样本与已生成代表样本之间的分布状况,对已生成的代表样本进行多次调整,从而使代表样本更具有代表性。这种方法有效地压缩了原始训练样本集,提高了分类效率;同时,由于代表样本的分布更加合理,可以提高分类的准确性。实验结果显示,此方法具有很好的分类性能。As a simple, effective and nonparametric classification method, k- Nearest Neighbor method is widely used in text classification, but it has large computational demands. In this paper a new fast text classification approach is proposed to solve the problem. The method generates representative samples through training the original samples, and then adjusts the representative samples repeatedly for enhancing its representative ability according to the distribution of the original training samples and generated representative samples. By using this approach, the original training corpus can be compressed effectively so that the classification efficiency can be improved substantially. Meanwhile, this approach makes the distribution of representative samples more even, so the classification performance can be improved. Experiments also show that this approach has a good performance.

关 键 词:文本分类 代表样本 快速分类 

分 类 号:TP391[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象