检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:管江红[1] GUAN Jianghong(School of Information engineering,Xizang Minzu University,Xianyang 712082,China)
机构地区:[1]西藏民族大学信息工程学院
出 处:《现代电子技术》2019年第21期182-186,共5页Modern Electronics Technique
摘 要:针对现有以χ^2统计量为基础的特征选择方法在不良文本过滤过程中效果欠佳的问题,利用双层分类对特征选择方法进行改良,提出一种不良文本过滤特征选择方法。首先通过改良逆文档频率来区分特征项在所属类别类内与其他类别之间的分布差别;其次,引入逆类别频率弥补抑制强度;最后,加入逆上层类别频率,清晰划分具有较高相似度的某两类二层类别。所改良的特征选择方法能够弥补现有χ^2统计量在类内/类间特征项分布情况判别能力的缺陷,将其应用于不良文本过滤过程,能够充分贴合不良文本过滤过程的特征选择需求。通过对比评估指标的结果,表明所提方法在不良文本过滤领域具有更好的效果。Since the current feature selection method based on the improvedχ^2 statistics has no good effect in the process of harmful text filtering,a harmful text filtering feature selection method is proposed,in which the double-layer classification is adopted to improve the feature selection method.First,the inverse document frequency is improved to distinguish the distribution difference of the feature items in their subordinate categories and between the other categories,and then the inverse category frequency is introduced to compensate the suppression intensity.In addition,by adding the inverse super-stratum category frequency,the double-layer category with high similarity is clearly divided into two categories.The modified feature selection method can compensate the shortcomings of the existingχ^2 statistics in discriminating the intra-class/inter-class distribution of feature items.The modified feature selection method can fully meet the feature selection requirements of the harmful text filtering process when it is used in the process of harmful text filtering.The comparison results of evaluation indexes show that the proposed method has better effect in the field of harmful text filtering.
关 键 词:特征选择 χ^2统计量 双层分类 不良文本过滤 特征项分布 评估指标
分 类 号:TN911.1[电子电信—通信与信息系统] 34[电子电信—信息与通信工程] TP18[自动化与计算机技术—控制理论与控制工程]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.229