一种改进的增量式贝叶斯文本分类算法  被引量:4

AN IMPROVED INCREMENTAL BAYESIAN TEXT CLASSIFICATION ALGORITHM

在线阅读下载全文

作  者:吴国文[1] 庄千料 

机构地区:[1]东华大学计算机科学与技术学院,上海201620

出  处:《计算机应用与软件》2017年第6期226-229,249,共5页Computer Applications and Software

基  金:国家自然科学基金项目(61472075)

摘  要:针对难以获得大量有标签的训练集问题,将增量式贝叶斯学习用于小规模训练集上,并提出了一种新的序列学习算法以弥补其学习序列中存在的不足:无法充分利用先验知识导致噪声数据不断传播。在增量学习的样本选择上,算法引入了配对样本检验和类支持度的知识,分别从横向和纵向角度充分利用先验知识来选取最优增量子集优化分类器,使分类器参数在动态学习过程中得以强化。实验结果表明,该算法能有效弱化噪声数据的消极影响,提高分类精度,同时能大幅度减少增量学习时间。Aiming at the difficulty of obtaining a large number of labeled training sets, incremental Bayesian learning is applied to the small training sets. And a new sequence learning algorithm is proposed to make up the shortcomings of its learning sequence unable to make full use of a priori knowledge leading to continuous dissemination of noise data. In the sample selection of incremental learning, the algorithm introduces the knowledge of paired sample test and class support and makes full use of prior knowledge to select the optimal increment subset optimization classifier from the horizontal and vertical angles, and makes the classifier parameters can be strengthened during the dynamic learning process. Experimental results show that the algorithm can effectively reduce the negative influence of noise data, improve the classification accuracy, and can greatly reduce the incremental learning time.

关 键 词:增量学习 贝叶斯分类 配对样本检验 类支持度 

分 类 号:TP3[自动化与计算机技术—计算机科学与技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象