统计流形学习中的文本度量方法被引量：1

Text Metric Method on Statistical Manifold Learning

出　　处：《小型微型计算机系统》2018年第3期515-519,共5页Journal of Chinese Computer Systems

基　　金：国家基金委面上项目(61673363)资助

摘　　要：传统的文本分类方法如核方法、TF-IDF等等,忽略了文本和词的语义信息以及主题分布的多样性.本文在高斯分布主题模型假设和统计流形学习框架的基础上,提出一种基于统计流形的文本距离度量方法(Text Metric on Statistical Manifold,TM SM).该算法是对主题模型的扩展,通过使用高斯混合模型来描述词在主题中的分布,得到了不同文本基于不同主题分布的概率模型表示.然后在统计流形学习框架下,通过度量概率模型来度量文本之间的距离,并使用在分类器算法上.多种数据集上进行的分类实验结果表明:和经典的文本分类方法相比,TMSM在所有测试数据集上均取得较好的分类准确率.Traditional methods for text classification, including kemel methods, TF-IDF, etc. ignore the semantic information and the diversity of topic distribution on words and texts. In this paper, a text metric method is proposed,which is based on the assumption of Gaussian distribution topic model and statistical manifold learning framework. The algorithm is called text metric on statistical mani- fold （TMSM）. TMSM is an extension of topic model, by utilizing a Gaussian mixture model to describe the distribution of all words, a probabilistic text representation model based on different distributions of topics can be obtained. Then the distance of texts can be cal- culated by statistical manifold learning. The experimental results on text classification tasks demonstrate TMSM outperforms all other methods on all datasets.

关键词：文本分类流形学习混合模型主题模型

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

统计流形学习中的文本度量方法被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

统计流形学习中的文本度量方法 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

统计流形学习中的文本度量方法被引量：1