检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:韩丽琪 HAN Liqi(China Petroleum Daily Co.,Ltd.,Beijing 100020,China)
出 处:《计算机应用文摘》2025年第8期111-113,117,共4页
摘 要:随着互联网信息的爆炸式增长,报社面临海量新闻数据处理的挑战。基于特征聚类和降维的智能分类算法为解决这一问题提供了新的思路。该算法首先利用ICTCLAS系统对新闻文本进行分词处理,去除停用词并区分词性;然后通过权重函数对特征进行降维,缩减关键词集;最后采用K-means聚类技术对文本特征进行聚类分类。该算法在TU95,YU75,OP954和ER9W7四个标准数据集上进行了测试,结果显示,分类准确率超过96%,召回率超过98%,相比于BERT-CNN和图注意力网络等主流算法,分别提高了约14%和18%。With the explosive growth of Internet information,newspapers are facing the challenge of massive news data processing.The intelligent classification algorithm based on feature clustering and dimensionality reduction provides a new approach to solving this problem.The algorithm first uses the ICTCLAS system to segment news text,remove stop words,and distinguish parts of speech.Then,the features are reduced in dimension and the keyword set is reduced through a weight function.Finally,K-means clustering technique is used to cluster and classify the text features.The algorithm was tested on four standard datasets,TU95,YU75,OP954,and ER9W7,and the results showed that the classification accuracy exceeded 96%and the recall rate exceeded 98%.Compared with mainstream algorithms such as BERT-CNN and graph attention network,it improved by about 14%and 18%respectively.
关 键 词:特征聚类 降维处理 新闻文本 智能分类 ICTCLAS 权重函数
分 类 号:TP393[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.7