检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:吴俊华 谭博觉 高切 陈木生 WU Junhua;TAN Bojue;GAO Qie;Chen Musheng(School of Software Engineering,Jiangxi University of Science and Technology,Nanchang 330013,China)
出 处:《重庆理工大学学报(自然科学)》2021年第7期283-290,共8页Journal of Chongqing University of Technology:Natural Science
基 金:江西省教育厅科学技术研究基金项目(GJJ180450)。
摘 要:针对垃圾网页检测过程中的"维数灾难"和不平衡分类问题,提出一种融合最优Fisher特征选择的样本加权K近邻分类器用于垃圾网页检测。首先,针对训练数据集进行Fisher特征选择,按Fisher Score从大到小排序,依次选择Fisher Score更大的特征对训练数据集进行样本加权的K近邻分类,根据训练数据集分类结果的AUC值是否增加以确定是否保留某个特征,最后基于保留的最优特征子集对测试数据集进行样本加权的K近邻分类。在WEBSPAM UK-2006数据集上的实验表明:该方法明显优于决策树、支持向量机、朴素贝叶斯、K近邻等传统分类器。与其他相关方法相比,该方法在准确率、F1测度和AUC指标上接近最优结果。Aiming at the problem of“the curse of dimensionality”and unbalanced classification in web spam detection,a novel classifier based on optimal Fisher feature selection and K nearest neighbor with instance weighting is proposed.First,Fisher feature selection is done based on the training dataset and all the features are sorted by their Fisher score descending.The features are selected according to the order of Fisher score descending to classify the training dataset by K nearest neighbor with instance weighting classifier.A feature is retained by the increase in the AUC value of the training dataset’s classification results.Finally,the testing dataset is classified by the K nearest neighbor with instance weighting classifier based on the optimal feature subset.The experimental results on WEBSPAM UK-2006 show that the proposed method is superior to the traditional classifiers such as decision tree,support vector machine,nave Bayes,K nearest neighbor etc.Compared with the state-of-the-arts methods,the proposed method is close to the optimal results on the accuracy,F1 measure and AUC index.
关 键 词:垃圾网页检测 特征选择 K近邻 不平衡数据分类 代价敏感分析
分 类 号:TP391.6[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.38