教学评价数据的离群点检测算法研究  被引量:1

Research on Outlier Detection Algorithm Based on Teaching Evaluation Data

在线阅读下载全文

作  者:李慧[1,2] 王国强[1] 郭瑞强[1,2] 高静伟[1] 暴延敏 LI Hui WANG Guo-qiang GUO Rui-qiang GAO Jing-wei BAO Yan-min(College of Mathematics and Information Science, Hebei Normal University, Shijiazhuang 050024 Key Laboratory of Computational Mathematics and application Hebei Normal University, Hebei Province, Shijiazhuang 050024)

机构地区:[1]河北师范大学数学与信息科学学院,石家庄050024 [2]河北省计算数学与应用重点实验室(河北师范大学),石家庄050024

出  处:《软件》2017年第4期18-25,共8页Software

基  金:河北师范大学教改课题资助(2015XJJG023)

摘  要:教学评价是大学教学活动中不可缺少的环节,可能出现故意抬高或压低评分及虚假评分的现象,应该找出这些离群数据并加以清除,以提高学生评教数据的正确性。离群点检测问题是数据挖掘技术的重要研究领域之一,本文实验所用教学评价数据属于分类型数据,目前针对分类型数据的离群点检测算法常用的有基于信息熵的贪婪算法和基于频率的AVF算法。针对贪婪算法时间复杂度较高,AVF算法不够准确的问题,本文提出一种改进的基于频率的离群点检测算法。本文算法首先采用改进的k-modes算法对教学评价数据进行聚类,并提出应用调整的余弦相似度公式作为相似性度量,筛选出远离簇中心的候选离群点,最后通过基于频率的离群点检测算法对候选集进行检测。在真实数据集上的实验表明算法在精确度和效率方面均具有优势。Teaching evaluation is an indispensable link in university teaching activities. In the process of teaching evaluation, some students may raise or reduce scores on purpose or do not take the evaluation seriously, in order to improve the correctness of the evaluation, we should detect and clear the outlier data. Outliers detection problem is one of the important research field of data mining technology. The experimental data of this paper is categorical data,currently outlier detection algorithm for categorical data commonly use greedy algorithm based on information en-tropy, and AVF algorithm based on frequency. In view of the greedy algorithm time complexity is high, and the AVF algorithm is not accurate enough,this paper proposes an improved outlier detection algorithm based on the fre_quency. The proposed algorithm first using the improved k-modes algorithm to cluster the teaching evaluation data,and put forward using the Adjusted cosine similarity formula as the similarity metric to screen out candidate outliers far from cluster center, finally detect the outlier from candidate selection by AVF algorithm. Experiments on real data sets show that the algorithm has advantages in terms of accuracy and efficiency.

关 键 词:离群点检测 k-modes聚类 余弦相似度 分类型数据 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象