短信文本分类技术的研究被引量：3

Research on Text Classification Technology for Message

出　　处：《计算机技术与发展》2016年第5期145-148,共4页Computer Technology and Development

基　　金：国家自然科学基金资助项目(11241005);山西省高等学校教学改革研究项目(J2012098);运城学院教学改革研究项目(JG201418)

摘　　要：短信作为一种重要的交流手段,发挥着越来越重要的作用。但伴随着短信的广泛使用,垃圾短信则严重影响着人们的生活,因此文中基于短信文本特征词对短信进行分类研究。其中,TF-IDF特征词权重计算方法是对文本词汇权重计算的一种经典算法,得到了广泛应用。但此方法为了简化计算,忽略了词语之间的相互关系。针对此问题,依据同一短信文本中的词汇之间存在的相互关系,文中对权重计算法进行了调整,提出了基于模糊K均值的短信文本分类算法。即先将短信文本集用TF-IDF算法处理,得到词汇-文本集,再用模糊K均值算法对得到的词汇-文本集进行处理。最后通过实验,验证了基于模糊K均值的短信文本分类算法,其分类结果的查全率和查准率都较高,有效辨别了垃圾短信。As an important means of communication,SMS plays an increasingly important role. But along with the extensive use of SMS,SMS spam seriously influences people＇s lives. Therefore,the classification of SMS is researched based on the keywords in this paper. TF- IDF weight calculation method is a classical algorithm to calculate the text word weight,which is widely used. But in order to calculate simply,this method ignores the mutual relations between words. Aiming at this problem,based on the same relationship between words in the text messages,in this paper,the weighting method is used for adjusting,it puts forward the text classification based on fuzzy K-means algorithm. The text set is processed by TF- IDF algorithm,getting a vocabulary- text set. Then fuzzy K- means algorithm is used to get a vocabulary- text set. Finally,through the experiment to verify the text classification based on fuzzy K- means algorithm,the classification results of recall and precision is high.

关键词：短信文本分类向量空间模型模糊聚类模糊K均值

分类号：TP301[自动化与计算机技术—计算机系统结构]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

短信文本分类技术的研究被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

短信文本分类技术的研究 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

短信文本分类技术的研究被引量：3