检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:周国剑 陈庆春[1] 类先富[1] ZHOU Guojian;CHEN Qingchun;LEI Xianfu(Southwest Jiaotong University,Chengdu 611756)
机构地区:[1]西南交通大学,成都611756
出 处:《计算机与数字工程》2020年第10期2430-2435,共6页Computer & Digital Engineering
摘 要:短文本一般具有篇幅较短、特征稀疏、信息量不明显等特性,直接使用传统的文本分类方法进行分类的效果一般不理想。潜在狄利克雷分布(LDA)主题模型生成的概率主题有助于使文本以语义为中心并减少稀疏性,进而使用概率主题信息对短文本进行特征扩展成为了可能。为了充分利用LDA主题模型的优势,论文提出了一种基于概率主题模型和文本互扩展的短文本分类方法,首先基于短文本自身语义信息的互扩展,然后依据LDA主题模型预测后得到的“文档—主题”和“主题—词”分布信息以及短文本的相异词关系实现短文本的特征扩展,最后使用支持向量机(SVM)分类方法进行短文本的分类处理。论文的分析验证结果表明,相较于单纯使用向量空间模型(VSM)来表征短文本,论文所提方法能有效改善对不同类别的短文本分类性能。Short texts generally exhibit the characteristics of short length,sparse features,and inconspicuous information.The direct application of the traditional text classification methods can not realize satisfactory results in general.The probabilistic themes generated by the Latent Dirichlet Allocation(LDA)topic model can be utilized to make the text more semantically centered and to reduce sparsity.Therefore,it is possible to use the information of the probabilistic topic to extend the features of the short text.In order to make full use of the advantages of LDA algorithm,a short text classification method based on probabilistic topic model and text mutual expansion is proposed in this paper.Firstly,a mutual expansion method is employed to expand the semantic information of short text.On the basis,the information of"document-theme"and"topic-word"distribution,as well as the relationship between different words of short text can be employed to realize the feature extension for short text.Finally support vector machine(SVM)classification method can be utilized to classify the short text.Experimental results are presented to show that,compared with the traditional method of characterizing short text using vector space model(VSM),the proposed method is able to improve the short text classification for different category short texts.
关 键 词:短文本 概率主题 特征扩展 潜在狄利克雷分布 支持向量机
分 类 号:TP391.1[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.49