基于改进距离和的异常点检测算法研究  被引量:11

Research on Outlier Detection Algorithm Based on Improved Distance

在线阅读下载全文

作  者:李春生[1] 于澍 刘小刚 LI Chun-sheng;YU Shu;LIU Xiao-gang(School of Computer and Information Technology,Northeast Petroleum University,Daqing 163318,China)

机构地区:[1]东北石油大学计算机与信息技术学院,黑龙江大庆163318

出  处:《计算机技术与发展》2019年第3期97-100,共4页Computer Technology and Development

基  金:国家自然科学面上项目(51774090);黑龙江省自然科学基金面上项目(F2015020);黑龙江省教育科研专项引导性创新基金项目(2017YDL-12);黑龙江省教育规划重大课题(GJ20170006)

摘  要:为了降低原始数据中的勘误影响,提高数据质量,深入分析了常用的基于距离的异常点检测算法,提出了一种新的基于改进距离的异常点检测算法,舍去了传统算法中对DB(d,p)参数的设置。首先,为了解决终端的不确定性选择属性困难的问题,引入了"属性隶属度"的概念,简化了检测属性的选择方式;其次,为了解决由于数据分布不均匀而导致的检测准确率较低的问题,改进了常用的距离度量,并采用改进的加权距离进行计算,得到距离矩阵,通过分析计算距离的总值,给出了一种异常评价方法用来判断异常点的异常程度;最后,以股票交易数据进行实验,与传统基于距离和的检测算法进行比较,结果表明该改进算法在异常点检测的准确度方面具有明显的改善。In order to reduce the influence of errata in the original data and improve the data quality,we deeply analyze the commonly used distance-based outlier detection algorithm,and propose a new outlier detection algorithm based on the improved distance,omitting the setting of DB(d,p) parameter in the traditional algorithm.First of all,in order to solve the problem of terminal uncertainty selection attributes,the concept of “attribute membership degree” is introduced to simplify the selection of detection attributes.Secondly ,in order to solve the problem of low detection accuracy caused by uneven data distribution,the commonly used distance measurement is improved,and the improved weighted distance is used for calculation to obtain the distance matrix.By analyzing the total value of the calculated distance,an anomaly evaluation method is proposed to judge the anomaly degree of the abnormal points.The experiment is conducted with the stock trading data.Compared with traditional distance-based detection algorithm,it shows that the improved algorithm has a significant improvement in accuracy of abnormal point detection.

关 键 词:数据挖掘 改进距离 异常数据检测 距离和 

分 类 号:TP301.6[自动化与计算机技术—计算机系统结构]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象