概率潜在语义分析的KNN文本分类算法被引量：3

KNN Text Classification Algorithm with Probabilistic Latent Semantic Analysis

出　　处：《计算机技术与发展》2017年第7期57-61,共5页Computer Technology and Development

基　　金：国家自然科学基金资助项目(61302157)

摘　　要：传统的KNN文本算法在计算文本之间的相似度时,只是做简单的概念匹配,没有考虑到训练集与测试集文本中词项携带的语义信息,因此在利用KNN分类器进行文本分类过程中有可能导致语义丢失,分类结果不准确。针对这种情况,提出了一种基于概率潜在主题模型的KNN文本分类算法。该算法预先使用概率主题模型对训练集文本进行文本-主题、主题-词项建模,将文本携带的语义信息映射到主题上的低维空间,把文本相似度用文本-主题、主题-词项的概率分布表示,对低维文本的语义信息利用KNN算法进行文本分类。实验结果表明,在训练较大的训练数据集和待分类数据集上,所提算法能够利用KNN分类器进行文本的语义分类,且能提高KNN分类的准确率和召回率以及F1值。Traditional KNN Text Classification （TC） algorithm just implements a simple concept matching during calculation of the simi- larity between texts without taking the semantic information of the text in training and test set into account. Thus it is possible to lose se- mantic meaning in the process of text classification with KNN classifier as well as inaccurate categorization results. Against this problem, a KNN text classification algorithm based on probabilistic latent topic model has been proposed, which establishes probabilistic topic mod- els of text-theme, theme-lexical item for training set texts beforehand to map the semantic information to low dimensional space of theme and dictates text similarity with probability distributions of text-theme and theme-lexical. The semantic information of low dimensional text can be classified with the proposed KNN algorithm. The experimental results show that in training of large training dataset and unclas- sified dataset,the proposed algorithm can conduct semantic classification of text with KNN classifier and enhance the accuracy and recall rate as well as F1 measure in KNN classification.

关键词：文本分类 KNN算法文本表示模型语义分类概率潜在主题模型

分类号：TP301.6[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

概率潜在语义分析的KNN文本分类算法被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

概率潜在语义分析的KNN文本分类算法 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

概率潜在语义分析的KNN文本分类算法被引量：3