一种用于文本分类的语义SVM及其在线学习算法被引量：2

The Semantic SVM Algorithm for Text Categorization and its On-line Learning Algorithm

机构地区：[1]南京理工大学计算机科学系,南京210094 [2]中国科学院计算机语言信息工程研究中心,北京100083

出　　处：《计算机工程与应用》2004年第36期11-14,57,共5页Computer Engineering and Applications

基　　金：国家自然科学基金资助(编号:60272088)

摘　　要：该文利用SVM在小训练样本集条件下仍有高泛化能力的特性,结合文本分类问题中同类别文本的特征在特征空间中具有聚类性分布的特点,提出一种使用语义中心集代替原训练样本集作为训练样本和支持向量的SVM:语义SVM。文中给出语义中心集的生成步骤,进而给出语义SVM的在线学习(在线分类知识积累)算法框架,以及基于SMO算法的在线学习算法的实现。实验结果说明语义SVM及其在线学习算法具有巨大的应用潜力:不仅在线学习速度和分类速度相对于标准SVM及其简单增量算法有数量级提高,而且分类准确率方面具有一定优势。This paper suggests a very efficient Support Vector Machine algorithm for text categorization,Semantic Support Vector Machines or Semantic SVMs.Semantic SVMs exploit the character of SVMs that they have good generation ability even with small training set.Semantic SVMs are also based on the truth that feature distribution of certain categorization of texts is clustery in feature space.The original training text set is substituted by Semantic center set in Semantic SVMs as training samples and support vectors.This paper gives out the steps to generate a Semantic SVM from training texts and the framework of on-line learning algorithm of Semantic SVMs.The implementaion of on-line learning algorithm based on Sequential Minimal Optimization is also devised in this paper.Experiments on real-life corpus show that Semantic SVMs are promising:tens times faster than standard SVMs while slightly improve the classifying precision.

关键词：文本分类支持向量机语义SVM 在线学习

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种用于文本分类的语义SVM及其在线学习算法被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一种用于文本分类的语义SVM及其在线学习算法 被引量：2

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一种用于文本分类的语义SVM及其在线学习算法被引量：2