一种基于特征权重的文本分类新算法  被引量:1

A New Text Classification Algorithm Based on Feature Weight

在线阅读下载全文

作  者:胡晓辉 HU Xiaohui

机构地区:[1]江西机电职业技术学院信息工程学院,南昌330013

出  处:《科技创新与应用》2023年第4期39-42,共4页Technology Innovation and Application

基  金:江西省教育厅科学技术研究项目(GJJ204203)。

摘  要:自然语言处理的一个重要分支,即自动文本分类,是文本信息处理的重要基础,是人工智能研究的一个热点,有助于文本的信息管理。已经有大量的专家学者对朴素贝叶斯、神经网络、支持向量机、k近邻等传统算法进行研究,但大量的专家学者实验证明KNN、SVM等经典的文本分类算法大都基于向量空间模型,因泛化能力不足,导致对于复杂的文本分类结果较差。该文提出一种新的特征权重计算方法,充分利用文本结构特征信息对特征权重进行计算,对于不同位置出现的词语赋予不同的权重,突出关键位置词语的重要性,同时考虑词分布密度对分类结果的影响,在分类模型中考虑词密度权重,优化TF-IDF算法。在2个语料库上的实验表明该文基于特征权重的分类算法较大地提高分类效果。Automatic text classification,an important branch of natural language processing,is an important foundation of text information processing and a hot spot in artificial intelligence research.It is helpful to text information management.A large number of experts and scholars have done a lot of research on naive Bayesian,neural network,support vector machine(SVM),k-nearest neighbor and other traditional algorithms,but a large number of experts and scholars have proved that classical text classification algorithms such as KNN and SVM are mostly based on support vector machine,which leads to poor results for complex text classification due to lack of generalization ability.This paper proposes a new feature weight calculation method,which makes full use of the text structure feature information to calculate the feature weight,gives different weights to words in different locations,highlights the importance of words in key locations,and considers the influence of word distribution density on classification results.Word density weight is considered in the classification model,and the TF-IDF algorithm is optimized.Experiments on two corpora show that the classification algorithm based on feature weight greatly improves the classification effect.

关 键 词:文本分类 特征选择 自然语言处理 类别信息 向量空间模型 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象