检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
机构地区:[1]中国科学技术大学计算机科学技术系 [2]安徽省计算与通讯软件重点实验室,安徽合肥230027
出 处:《计算机仿真》2007年第6期322-325,共4页Computer Simulation
基 金:国家自然科学基金(60204009);中科院复杂系统与智能科学重点实验室开放基金(20040104);973课题(2004CB318109)。
摘 要:κ-近邻作为一种简单、有效、非参数的分类方法,在文本分类中得到广泛的应用,但是这种方法计算量较大。针对κ-近邻法的不足之处,提出了一种新的快速文本分类方法,通过对原始训练样本集的训练生成代表样本,再根据原始训练样本与已生成代表样本之间的分布状况,对已生成的代表样本进行多次调整,从而使代表样本更具有代表性。这种方法有效地压缩了原始训练样本集,提高了分类效率;同时,由于代表样本的分布更加合理,可以提高分类的准确性。实验结果显示,此方法具有很好的分类性能。As a simple, effective and nonparametric classification method, k- Nearest Neighbor method is widely used in text classification, but it has large computational demands. In this paper a new fast text classification approach is proposed to solve the problem. The method generates representative samples through training the original samples, and then adjusts the representative samples repeatedly for enhancing its representative ability according to the distribution of the original training samples and generated representative samples. By using this approach, the original training corpus can be compressed effectively so that the classification efficiency can be improved substantially. Meanwhile, this approach makes the distribution of representative samples more even, so the classification performance can be improved. Experiments also show that this approach has a good performance.
分 类 号:TP391[自动化与计算机技术—计算机应用技术] TP18[自动化与计算机技术—计算机科学与技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28