检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:但唐朋 许天成 张姝涵 DAN Tangpeng;XU Tiancheng;ZHANG Shuhan(School of Computer,Central China Normal University,Wuhan 430079)
出 处:《计算机与数字工程》2020年第3期556-560,共5页Computer & Digital Engineering
基 金:华中师范大学国家级大学生创新创业训练计划(编号:201810511002);华中师范大学院级大学生创新创业训练计划(编号:CA20180418221834349C)资助。
摘 要:随着Internet技术的发展,人们不仅可以从网络获取信息,也能够在网络上表达个人观点、分享自身体验。自Web2.0以来网络已经由原来的阅读式网络转换成为了当今的交互式网络。而伴随网络发展的是成几何速率增长的网络信息。文本信息是网络信息的重要组成部分,不同文本信息可以分成新闻、娱乐、时评、财经等不同类别。进行中文文本分类不仅能为建立文本语料库提供便利还能够应用到其它数据挖掘领域。论文基于改进TF-IDF特征并结合SVM模型设计了一种自动化的中文文本分类系统。实验证明,对比传统特征提取方式,采用改进TF-IDF特征策略进行文本分类能够获得更高的准确度。With the development of Internet technology,people can not only obtain information from the Internet,but also express personal opinions and analyze their own experiences on the Internet. Since Web2.0,the network has been transformed from the original reading network to today’s interactive network. What’s more,with the development of network,the network information of geometric growth rate is growing. Text information is an important part of network information. Different text information can be divided into different categories such as news,entertainment,commentary,finance and so on. Chinese text classification can not only facilitate the establishment of a text corpus,but also can be applied to other data mining areas. In this paper,an automatic Chinese text classification system is designed based on improved TF-IDF features and SVM model. Experiments show that the classification system constructed by machine learning algorithms can achieve high degree of accuracy and meets practical needs.
关 键 词:文本分类 自然语言处理 BOW模型 机器学习 改进TF-IDF特征
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.62