关于大数据存储过程中缺失信息检测仿真被引量：3

Detection and simulation of missing information in big data storage process

作　　者：冉娟任琼[2] RAN Juan;REN Qiong(Department of Computer Science &Technology Tianjin University Renai College Tian jing 301636,China;. School of Mathematics and Computer Science,Jianghan University,Wuhan Hubei 430056,China)

机构地区：[1]天津大学仁爱学院计算机科学与技术系,天津301636 [2]江汉大学数学与计算机科学学院,湖北武汉430056

出　　处：《计算机仿真》2018年第12期451-455,共5页Computer Simulation

摘　　要：对大数据存储过程中缺失信息进行有效检测,不仅可以避免用户数据查询异常,而且可以提高系统非完整数据挖掘分析的准确性与完整性。当前缺失信息检测方法在数据量上升的过程中,由检测算法带来的检测时延呈现指数增长,影响检测精度,甚至造成系统程序阻塞崩溃,为了对现有方法的检测时延进行有效优化,同时兼顾检测精度,提出了分布式优化近邻聚类的缺失信息检测方法。首先采用近邻传播对非完整数据集做聚类处理,将其分为完整和非完整两个数据集,并利用提出的区间相似度,把属于一类的数据归属于同一个簇,这种聚类方式避免了其它对象带来的干扰,有利于提高聚类精度和速度;然后,为了更加有效的提高检测算法执行效率,设计了分布式计算优化聚类过程,将主要耗时操作的聚类过程采取并行计算;最后,将聚类后得到的同类对象利用信息熵计算,检测得到缺失信息。通过仿真,验证了所提方法对于非完整数据缺失信息检测时延具有明显的优化效果,同时具有良好的检测精度。Effective detection of missing information in large data stored procedures,Not only can you avoid user data query exceptions,Moreover,the accuracy and completeness of the analysis of incomplete data mining can be improved.Current lack of information detection method in the process of data increase,The detection algorithm brought by the detection algorithm shows exponential growth,Impact detection accuracy,Even causing the system to block crashes,In order to optimize the detection delay of existing methods,At the same time,the precision of detection is given.This paper presents a method for the detection of the missing information of the distributed optimization neighbor clustering.First,the non 2.complete dataset is used to cluster the non -complete data set.Divide it into complete and incomplete data sets.And using the proposed interval similarity,To ascribe a category of data to the same cluster,This clustering method avoids interference from other objects.It is helpful to improve the precision and speed of clustering.Then,in order to improve the efficiency of detection algorithm more effectively,The distributed computing optimization clustering process is designed.Parallel computation of the main time consuming clustering process; Finally,the similar objects obtained after clustering are calculated using information entropy.The missing information was detected.Through the simulation experiment,It is proved that the proposed method has obvious optimization effect on the detection delay of incomplete data.It also has good detection accuracy.

关键词：非完全数据缺失信息近邻传播区间相似度分布式计算

分类号：TP391.9[自动化与计算机技术—计算机应用技术]

参考文献：

正在载入数据...

二级参考文献：

正在载入数据...

耦合文献：

正在载入数据...

引证文献：

正在载入数据...

二级引证文献：

正在载入数据...

同被引文献：

正在载入数据...

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

关于大数据存储过程中缺失信息检测仿真被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

高级检索检索式检索

时间限定

期刊范围

学科限定全选

高级检索 检索式检索

时间限定

期刊范围

学科限定全选

关于大数据存储过程中缺失信息检测仿真 被引量：3

我的收藏

参考文献：

二级参考文献：

耦合文献：

引证文献：

二级引证文献：

同被引文献：

相关期刊文献：

相关的主题

相关的作者对象

相关的机构对象

下载全文

用户登录

高级检索检索式检索

关于大数据存储过程中缺失信息检测仿真被引量：3