基于模拟样本训练的支持向量机  

Research on the Text Classification of Parallel SVM Based on the Simulated Samples

在线阅读下载全文

作  者:张洪胜[1] 高海宾[1] ZHANG Hong-sheng;GAO Hai-bin;Huainan United University(Department of Computer Science and Technology Huainan 232038,Anhui,China)

机构地区:[1]淮南联合大学计算机科学与技术系

出  处:《韶关学院学报》2019年第12期13-17,共5页Journal of Shaoguan University

基  金:安徽省教育厅自然科学重点项目(KJ2017A586)

摘  要:在基于内容学习的文本分类中,人工标注的训练样本存在着数量有限、获取困难以及由普通文本转换为向量形式的学习样本时间较长等问题.针对此情况提出一种基于有限人工标注样本特征空间和TF-IDF权重计算的样本模拟生成算法.该算法先通过特征抽取获得类别的特征空间,然后利用TF-IDF公式计算特征的权重,再通过随机算法生成模拟样本,并将其用于支持向量机的文本分类中.实验结果表明,利用该算法生成的模拟训练样本训练得到的分类器,具有良好的分类效果,能极大地减少训练样本的生成时间.In the text classification based on content learning, there are some problems such as limited number of manually annotated training samples, difficulty in obtaining, and long time of learning samples converted from ordinary text into vector form. Therefore, a sample simulation generation algorithm based on feature space of limited manually annotated samples and weight calculation of TF-IDF is proposed in this paper. This algorithm firstly obtains feature space of categories through feature extraction, and then calculates the weight of features by using TF-IDF formula, finally simulation samples generated by random algorithm,and applied to the parallel support vector machine(SVM) in the text classification. The experimental results show that the classifier obtained by using the simulated training samples generated by the algorithm has a good classification effect and can greatly reduce the generation time of training samples.

关 键 词:模拟样本 支持向量机 文本分类 特征空间 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象