基于关键词加权的混合特征文本快速分类仿真  

Simulation of Fast Text Classification Based on Keyword Weighting

在线阅读下载全文

作  者:徐佳丽 杨长红 XU Jia-li;YANG Chang-hong(School of Electronic and Information Engineering,Nanchang Normal College of Applied Technology,Nanchang Jiangxi 330038,China;School of Mathematics and Computer Science,Jiangxi Science and Technology Normal University,Nanchang Jiangxi 330038,China)

机构地区:[1]南昌应用技术师范学院电子与信息工程学院,江西南昌330038 [2]江西科技师范大学数学与计算机科学学院,江西南昌330038

出  处:《计算机仿真》2024年第3期510-513,518,共5页Computer Simulation

摘  要:电子文本形式的网络信息不仅数量多,且混合特征具有较高相似性,很难达到特征的平均分布。特征项在类别间的不均性导致文本权重计算易出现偏差,影响类别特征词的提取,导致文本分类难度较大。为此,提出一种基于关键词加权的混合特征文本快速分类方法。采用词频逆文本频率指数信息检索方法对文本加权,计算不同权重下文本关键词在中心集合中出现的频率。根据频率阈值提取关键特征,确定文本集合中类中心点。计算与类中心相关性最高的文本数据,提取关联度特征。建立神经网络分类模型,预先设定一组包含详细特征的文本集,作为初始值输入到神经网络中,每个层次根据目标特征逐一比对实现有效分类。实验证明,所研究方法的查全率更高,文本混合特征提取的召回率高于40%,说明研究方法应用性能更优,对不同种类的文本集均能完成精准分类。For the network information in the form of electronic text,the mixed features have high similarity,so it is difficult to achieve the average distribution of features.The non-uniformity of feature items among categories leads to the deviation in calculating text weight,affecting the extraction of category feature words and text classification.Therefore,this article presented a fast classification method for the text with hybrid features based on keyword weighting.Firstly,the text was weighted by the information retrieval method based on term frequency-inverse document frequency index.Secondly,the frequency of text keywords in the central set was calculated under different weights.Then,key features were extracted according to the frequency threshold.Meanwhile,the final cluster center in the text set was determined.Thirdly,the text data with the highest correlation with the cluster center was calculated,and the correlation feature was extracted.After that,a neural network classification model was built.Moreover,a group of text sets containing detailed features was preset and input into the neural network as initial values.Finally,all levels were compared one by one according to the target features.Thus,effective classification was achieved.Experiment results prove that the recall rate of the method is higher,and the recall rate of mixed feature extraction of text is more than 40%,indicating that the method has better application performance,and can complete accurate classification for different kinds of text sets.

关 键 词:关键词加权 混合特征文本 频率阈值 神经网络分类模型 

分 类 号:TP327[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象