检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:徐佳丽 杨长红 XU Jia-li;YANG Chang-hong(School of Electronic and Information Engineering,Nanchang Normal College of Applied Technology,Nanchang Jiangxi 330038,China;School of Mathematics and Computer Science,Jiangxi Science and Technology Normal University,Nanchang Jiangxi 330038,China)
机构地区:[1]南昌应用技术师范学院电子与信息工程学院,江西南昌330038 [2]江西科技师范大学数学与计算机科学学院,江西南昌330038
出 处:《计算机仿真》2024年第3期510-513,518,共5页Computer Simulation
摘 要:电子文本形式的网络信息不仅数量多,且混合特征具有较高相似性,很难达到特征的平均分布。特征项在类别间的不均性导致文本权重计算易出现偏差,影响类别特征词的提取,导致文本分类难度较大。为此,提出一种基于关键词加权的混合特征文本快速分类方法。采用词频逆文本频率指数信息检索方法对文本加权,计算不同权重下文本关键词在中心集合中出现的频率。根据频率阈值提取关键特征,确定文本集合中类中心点。计算与类中心相关性最高的文本数据,提取关联度特征。建立神经网络分类模型,预先设定一组包含详细特征的文本集,作为初始值输入到神经网络中,每个层次根据目标特征逐一比对实现有效分类。实验证明,所研究方法的查全率更高,文本混合特征提取的召回率高于40%,说明研究方法应用性能更优,对不同种类的文本集均能完成精准分类。For the network information in the form of electronic text,the mixed features have high similarity,so it is difficult to achieve the average distribution of features.The non-uniformity of feature items among categories leads to the deviation in calculating text weight,affecting the extraction of category feature words and text classification.Therefore,this article presented a fast classification method for the text with hybrid features based on keyword weighting.Firstly,the text was weighted by the information retrieval method based on term frequency-inverse document frequency index.Secondly,the frequency of text keywords in the central set was calculated under different weights.Then,key features were extracted according to the frequency threshold.Meanwhile,the final cluster center in the text set was determined.Thirdly,the text data with the highest correlation with the cluster center was calculated,and the correlation feature was extracted.After that,a neural network classification model was built.Moreover,a group of text sets containing detailed features was preset and input into the neural network as initial values.Finally,all levels were compared one by one according to the target features.Thus,effective classification was achieved.Experiment results prove that the recall rate of the method is higher,and the recall rate of mixed feature extraction of text is more than 40%,indicating that the method has better application performance,and can complete accurate classification for different kinds of text sets.
关 键 词:关键词加权 混合特征文本 频率阈值 神经网络分类模型
分 类 号:TP327[自动化与计算机技术—计算机系统结构]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.222