基于指数分布族的类特定文本分类算法  被引量:2

Class-specifictext classification algorithm based on exponential family

在线阅读下载全文

作  者:刘云[1] 黄荣乘 LIU Yun;HUANG Rongcheng(Faculty of Information Engineering andAutomation,Kunming University of Science and Technology,Kunming 650050,P.R.China)

机构地区:[1]昆明理工大学信息工程与自动化学院

出  处:《重庆邮电大学学报(自然科学版)》2019年第5期694-701,共8页Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)

基  金:国家自然科学基金(61262040)~~

摘  要:在文本分类中,选取一个高效的分类算法是提高文本分类准确度,缩短分类时间的关键。提出基于指数分布族的多项式贝叶斯类特定分类算法(exponential family-multinomial naive Bayes,EF-MNB),基于多项式模型构造了 N 个类的分布,利用类特定特征选择算法得到第 N 个类的特征子集及对应类的特征概率密度函数(probability density function,PDF),通过指数分布族构造了 N 个类的原始PDF估计表达式,给定 N 个类的训练集,得到了第 N 个类的最优PDF估计,并基于贝叶斯定理制定了分类规则。仿真结果表明,与基于文档主题生成模型和支持向量机(latent dirichlet allocation-support vector machine,LDA-SVM)的层次分析分类算法、改进的超球支持向量机(improved hyper-sphere support vector machine,IHS-SVM)文本分类算法和基于主成份分析和k最近邻(principal component analysis-k-nearest-neighbor,PCA-KNN)混合分类算法相比,EF-MNB类特定分类算法使用少量的时间就可获得更高分类准确率。In text categorization, choosing an efficient classification algorithm is the key to improve the accuracy of text classification and shorten the classification time. This paper proposes a multinomial Bayesian-specific classification algorithm (EF-MNB) based on an exponential family, and constructs a distribution of N classes based on a polynomial model. Using the class specific feature selection algorithm to obtain the feature subset of the N th class and the feature probability density function of the corresponding class probability density function, the original PDF estimate expressions of N classes are constructed by exponential family distribution. Given the training sets of N classes, the optimal PDF estimates for the N th class are obtained, and the classification rules are formulated based on Bayes’ theorem. The simulation results show that compared with the hierarchical analysis classification algorithm based on latent dirichlet allocation and support vector machine (LDA-SVM), improved hyper-sphere support vector machine (IHS-SVM) hybrid classification algorithm and the principal component analysis- k -nearest-neighbor (PCA-KNN) hybrid classification algorithm, the EF-MNB class specific classification algorithm achieves higher classification accuracy in a small amount of time.

关 键 词:指数分布族 类特定特征选择 类条件概率密度函数 多项式朴素贝叶斯分类器 文本分类 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象