基于LSI和SVM的文本分类研究被引量：8

Research on Text Classification Based on LSI and SVM

作　　者：刘美茹

出　　处：《计算机工程》2007年第15期217-219,共3页Computer Engineering

摘　　要：文本分类技术是文本数据挖掘的基础和核心,是基于自然语言处理技术和机器学习算法的一个具体应用。特征选择和分类算法是文本分类中两个最关键的技术,该文提出了利用潜在语义索引进行特征提取和降维,并结合支持向量机(SVM)算法进行多类分类,实验结果显示与向量空间模型(VSM)结合SVM方法和LSI结合K近邻(KNN)方法相比,取得了更好的效果,在文本类别数较少、类别划分比较清晰的情况下可以达到实用效果。Text classification is the foundation and crucial problem of text data mining, it is an application based on the technology of natural language processing and machine learning. Feature extraction and categorization algorithm are the most crucial technologies for this problem. This paper proposes that latent semantic indexing （LSI） is used for feature extraction and dimensionaiity reduction, support vector machine（SVM） is used for text classification. The result shows that compared with the classifier based on vector space model combined SVM and the classifier based on LSI combined K-nearest neighbor （KNN）, better performance is acheived. It shows that while the number of categories is small, and the categories are divided distinctly, the method can be used for practical application.

关键词：特征提取潜在语义索引支持向量机

分类号：TP391[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于LSI和SVM的文本分类研究被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

基于LSI和SVM的文本分类研究 被引量：8

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

基于LSI和SVM的文本分类研究被引量：8