基于改进TF-IDF特征的中文文本分类系统  被引量:12

A Chinese Text Classification System Based on Improved TF-IDF Feature

在线阅读下载全文

作  者:但唐朋 许天成 张姝涵 DAN Tangpeng;XU Tiancheng;ZHANG Shuhan(School of Computer,Central China Normal University,Wuhan 430079)

机构地区:[1]华中师范大学计算机学院,武汉430079

出  处:《计算机与数字工程》2020年第3期556-560,共5页Computer & Digital Engineering

基  金:华中师范大学国家级大学生创新创业训练计划(编号:201810511002);华中师范大学院级大学生创新创业训练计划(编号:CA20180418221834349C)资助。

摘  要:随着Internet技术的发展,人们不仅可以从网络获取信息,也能够在网络上表达个人观点、分享自身体验。自Web2.0以来网络已经由原来的阅读式网络转换成为了当今的交互式网络。而伴随网络发展的是成几何速率增长的网络信息。文本信息是网络信息的重要组成部分,不同文本信息可以分成新闻、娱乐、时评、财经等不同类别。进行中文文本分类不仅能为建立文本语料库提供便利还能够应用到其它数据挖掘领域。论文基于改进TF-IDF特征并结合SVM模型设计了一种自动化的中文文本分类系统。实验证明,对比传统特征提取方式,采用改进TF-IDF特征策略进行文本分类能够获得更高的准确度。With the development of Internet technology,people can not only obtain information from the Internet,but also express personal opinions and analyze their own experiences on the Internet. Since Web2.0,the network has been transformed from the original reading network to today’s interactive network. What’s more,with the development of network,the network information of geometric growth rate is growing. Text information is an important part of network information. Different text information can be divided into different categories such as news,entertainment,commentary,finance and so on. Chinese text classification can not only facilitate the establishment of a text corpus,but also can be applied to other data mining areas. In this paper,an automatic Chinese text classification system is designed based on improved TF-IDF features and SVM model. Experiments show that the classification system constructed by machine learning algorithms can achieve high degree of accuracy and meets practical needs.

关 键 词:文本分类 自然语言处理 BOW模型 机器学习 改进TF-IDF特征 

分 类 号:P315.69[天文地球—地震学]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象