一个基于非法文本用词特征分析的文本分类器被引量：1

A Text Category Method based on the Analysis of the Characters of Words in Illegal Texts

机构地区：[1]大连交通大学,大连116052 [2]山西大学,太原030006

出　　处：《电脑开发与应用》2006年第10期2-3,6,共3页Computer Development & Applications

基　　金：国家自然科学基金(60475022);山西省自然科学基金(20041041)资助

摘　　要：针对互联网中的不健康内容,通过对这类文本中用词特征的形式及出现频率的统计与分析,提出一种基于符号密度计算的特殊的自动识别算法。首先通过对训练文本的统计,得到初始特殊词表作为识别的基础。在进行文本分类时,利用包含两次筛选的特殊词自动识别算法动态更新特殊词表及其权值,从而将特殊词信息与二分文本分类器相结合,提高对不健康文本的识别精度。结果表明,加入特殊词自动识别及判断,有效地提高了非法文本的识别精度。For the ill healthy content, this paper puts forward a special word automatic identifier method based on the symbol density through the research for the format of special words and the statistic and analysis of the frequency of the special words. First, we get a special words table by the training the set of the special texts. When the texts category has been performed, it can use the automatic identifier method to pick up the new special words. With the information of special words, the dimidiate text category can improve the precision to the ill health text. The result shows it has improved the precision of the text category.

关键词：特殊词特征分析符号密度自动识别二分文本分类器

分类号：TP393[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一个基于非法文本用词特征分析的文本分类器被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

一个基于非法文本用词特征分析的文本分类器 被引量：1

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

一个基于非法文本用词特征分析的文本分类器被引量：1