基于χ~2统计量的不良文本过滤特征选择方法  被引量:1

Harmful text filtering feature selection method based on χ~2 statistics

在线阅读下载全文

作  者:管江红[1] GUAN Jianghong(School of Information engineering,Xizang Minzu University,Xianyang 712082,China)

机构地区:[1]西藏民族大学信息工程学院

出  处:《现代电子技术》2019年第21期182-186,共5页Modern Electronics Technique

摘  要:针对现有以χ^2统计量为基础的特征选择方法在不良文本过滤过程中效果欠佳的问题,利用双层分类对特征选择方法进行改良,提出一种不良文本过滤特征选择方法。首先通过改良逆文档频率来区分特征项在所属类别类内与其他类别之间的分布差别;其次,引入逆类别频率弥补抑制强度;最后,加入逆上层类别频率,清晰划分具有较高相似度的某两类二层类别。所改良的特征选择方法能够弥补现有χ^2统计量在类内/类间特征项分布情况判别能力的缺陷,将其应用于不良文本过滤过程,能够充分贴合不良文本过滤过程的特征选择需求。通过对比评估指标的结果,表明所提方法在不良文本过滤领域具有更好的效果。Since the current feature selection method based on the improvedχ^2 statistics has no good effect in the process of harmful text filtering,a harmful text filtering feature selection method is proposed,in which the double-layer classification is adopted to improve the feature selection method.First,the inverse document frequency is improved to distinguish the distribution difference of the feature items in their subordinate categories and between the other categories,and then the inverse category frequency is introduced to compensate the suppression intensity.In addition,by adding the inverse super-stratum category frequency,the double-layer category with high similarity is clearly divided into two categories.The modified feature selection method can compensate the shortcomings of the existingχ^2 statistics in discriminating the intra-class/inter-class distribution of feature items.The modified feature selection method can fully meet the feature selection requirements of the harmful text filtering process when it is used in the process of harmful text filtering.The comparison results of evaluation indexes show that the proposed method has better effect in the field of harmful text filtering.

关 键 词:特征选择 χ^2统计量 双层分类 不良文本过滤 特征项分布 评估指标 

分 类 号:TN911.1[电子电信—通信与信息系统] 34[电子电信—信息与通信工程] TP18[自动化与计算机技术—控制理论与控制工程]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象