检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
出 处:《情报理论与实践》2014年第5期102-106,共5页Information Studies:Theory & Application
基 金:国家自然科学基金资助项目"面向文本分类的多学科协同建模理论与实验研究"的成果之一;项目编号:71373291
摘 要:阐述传统KNN分类器的基本原理和其存在的不足之处;针对样本数量增大,维度上升时KNN算法中相似度计算量急剧增大的问题,提出基于维度索引表的改进KNN分类算法;该算法通过建立特征项维度索引表加速KNN算法中寻找K近邻;以搜狗自然语言实验室的文本分类语料库中的新闻文档作为实验对象,采用宏平均F测度值作为分类效果评价标准,用改进KNN方法和传统KNN方法进行对比实验。实验结果表明:该方法能大幅度减少寻找K近邻时相似度计算的次数。In addition to elaborate the basic principle and existing shortcomings of traditional KNN classifier, this paper puts forward the improved KNN classification algorithm based on dimension index table, which according to the increasing number of samples and rapidly increasing problems of similarity computation of KNN algorithm when dimension rises. The algorithm accelerates the search of finding K-nearest neighbor in KNN algorithm by establishing the feature dimension index table. With the news docu- ment in the text categorization corpus of Sogou Natural Language Lab as the experimental object, the comparative experiment was carried out with the improved KNN algorithm and traditional KNN algorithm evaluated by Macro-averaging F-measures. The experi- mental result shows that this method can greatly reduce the times of similarity computation when searching K-nearest neighbor.
分 类 号:TP391.3[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.3