检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:马召贵 MA Shaogui(Nanjing Engineering Branch of Jiangsu United Vocational and Technical College,Nanjing Jiangsu 211135,China)
机构地区:[1]江苏联合职业技术学院南京工程分院,江苏南京211135
出 处:《信息与电脑》2023年第12期85-87,共3页Information & Computer
摘 要:针对常规文本分类算法存在文本特征提取不全面的问题,提出基于改进K近邻(K-Nearest Neighbor,KNN)的不均衡信息文本分类算法。首先,通过文本分词与去停用词两个步骤,对不均衡信息文本进行预处理,避免无用数据对分类结果产生干扰。其次,利用互信息特征提取方法,提取不均衡信息文本特征,获取文本特征词与类别之间的相关程度。最后,利用改进KNN原理对待测不均衡信息文本数据进行邻近聚类,设计文本分类算法。实验结果表明,该算法的分类查准率始终在98%以上,优于对照组。To address the issue of incomplete text feature extraction in conventional text classification algorithms,an imbalanced information text classification algorithm based on improved K-Nearest Neighbor(KNN)is proposed.Firstly,through text segmentation and removing Stop word,the unbalanced information text is preprocessed to avoid useless data interfering with the classification results.Secondly,mutual information feature extraction method is used to extract text features of unbalanced information and obtain the correlation between text feature words and categories.Finally,using the improved KNN principle to perform neighborhood clustering on text data with unbalanced information to be tested,a text classification algorithm is designed.The experimental results show that the classification accuracy of this algorithm is always above 98%,which is better than the control group.
分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.116