基于改进KNN的不均衡信息文本分类算法  被引量:1

Unbalanced Information Text Classification Algorithm Based on an Improved KNN

在线阅读下载全文

作  者:马召贵 MA Shaogui(Nanjing Engineering Branch of Jiangsu United Vocational and Technical College,Nanjing Jiangsu 211135,China)

机构地区:[1]江苏联合职业技术学院南京工程分院,江苏南京211135

出  处:《信息与电脑》2023年第12期85-87,共3页Information & Computer

摘  要:针对常规文本分类算法存在文本特征提取不全面的问题,提出基于改进K近邻(K-Nearest Neighbor,KNN)的不均衡信息文本分类算法。首先,通过文本分词与去停用词两个步骤,对不均衡信息文本进行预处理,避免无用数据对分类结果产生干扰。其次,利用互信息特征提取方法,提取不均衡信息文本特征,获取文本特征词与类别之间的相关程度。最后,利用改进KNN原理对待测不均衡信息文本数据进行邻近聚类,设计文本分类算法。实验结果表明,该算法的分类查准率始终在98%以上,优于对照组。To address the issue of incomplete text feature extraction in conventional text classification algorithms,an imbalanced information text classification algorithm based on improved K-Nearest Neighbor(KNN)is proposed.Firstly,through text segmentation and removing Stop word,the unbalanced information text is preprocessed to avoid useless data interfering with the classification results.Secondly,mutual information feature extraction method is used to extract text features of unbalanced information and obtain the correlation between text feature words and categories.Finally,using the improved KNN principle to perform neighborhood clustering on text data with unbalanced information to be tested,a text classification algorithm is designed.The experimental results show that the classification accuracy of this algorithm is always above 98%,which is better than the control group.

关 键 词:K近邻(KNN) 不均衡 信息文本 分类算法 

分 类 号:TP391.41[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象