检索规则说明:AND代表“并且”;OR代表“或者”;NOT代表“不包含”;(注意必须大写,运算符两边需空一格)
检 索 范 例 :范例一: (K=图书馆学 OR K=情报学) AND A=范并思 范例二:J=计算机应用与软件 AND (U=C++ OR U=Basic) NOT M=Visual
作 者:冉娟 任琼[2] RAN Juan;REN Qiong(Department of Computer Science &Technology Tianjin University Renai College Tian jing 301636,China;. School of Mathematics and Computer Science,Jianghan University,Wuhan Hubei 430056,China)
机构地区:[1]天津大学仁爱学院计算机科学与技术系,天津301636 [2]江汉大学数学与计算机科学学院,湖北武汉430056
出 处:《计算机仿真》2018年第12期451-455,共5页Computer Simulation
摘 要:对大数据存储过程中缺失信息进行有效检测,不仅可以避免用户数据查询异常,而且可以提高系统非完整数据挖掘分析的准确性与完整性。当前缺失信息检测方法在数据量上升的过程中,由检测算法带来的检测时延呈现指数增长,影响检测精度,甚至造成系统程序阻塞崩溃,为了对现有方法的检测时延进行有效优化,同时兼顾检测精度,提出了分布式优化近邻聚类的缺失信息检测方法。首先采用近邻传播对非完整数据集做聚类处理,将其分为完整和非完整两个数据集,并利用提出的区间相似度,把属于一类的数据归属于同一个簇,这种聚类方式避免了其它对象带来的干扰,有利于提高聚类精度和速度;然后,为了更加有效的提高检测算法执行效率,设计了分布式计算优化聚类过程,将主要耗时操作的聚类过程采取并行计算;最后,将聚类后得到的同类对象利用信息熵计算,检测得到缺失信息。通过仿真,验证了所提方法对于非完整数据缺失信息检测时延具有明显的优化效果,同时具有良好的检测精度。Effective detection of missing information in large data stored procedures,Not only can you avoid user data query exceptions,Moreover,the accuracy and completeness of the analysis of incomplete data mining can be improved.Current lack of information detection method in the process of data increase,The detection algorithm brought by the detection algorithm shows exponential growth,Impact detection accuracy,Even causing the system to block crashes,In order to optimize the detection delay of existing methods,At the same time,the precision of detection is given.This paper presents a method for the detection of the missing information of the distributed optimization neighbor clustering.First,the non 2.complete dataset is used to cluster the non -complete data set.Divide it into complete and incomplete data sets.And using the proposed interval similarity,To ascribe a category of data to the same cluster,This clustering method avoids interference from other objects.It is helpful to improve the precision and speed of clustering.Then,in order to improve the efficiency of detection algorithm more effectively,The distributed computing optimization clustering process is designed.Parallel computation of the main time consuming clustering process; Finally,the similar objects obtained after clustering are calculated using information entropy.The missing information was detected.Through the simulation experiment,It is proved that the proposed method has obvious optimization effect on the detection delay of incomplete data.It also has good detection accuracy.
关 键 词:非完全数据 缺失信息 近邻传播 区间相似度 分布式计算
分 类 号:TP391.9[自动化与计算机技术—计算机应用技术]
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在载入数据...
正在链接到云南高校图书馆文献保障联盟下载...
云南高校图书馆联盟文献共享服务平台 版权所有©
您的IP:216.73.216.28