面向多尺度数据挖掘的数据尺度划分方法  被引量:6

Data Scaling Method for Multi-scale Data Mining

在线阅读下载全文

作  者:张昉 赵书良[1,2] 武永亮 ZHANG Fang;ZHAO Shu-liang;WU Yong-liang(College of Mathematics & Information Science,Hebei Normal University,Shijiazhuang 050024,China;Hebei Key Laboratory of Computational Mathematics & Applications,Hebei Normal University,Shijiazhuang,050024,China)

机构地区:[1]河北师范大学数学与信息科学学院,石家庄050024 [2]河北师范大学河北省计算数学与应用重点实验室,石家庄050024

出  处:《计算机科学》2019年第4期57-65,共9页Computer Science

基  金:国家自然科学基金资助项目(71271067);国家社科基金重大项目(13&ZD091;18ZDA200);河北师范大学硕士基金资助项目(CXZZSS2017048)资助

摘  要:多尺度挖掘在图形图像、地理信息、信号分析、数据挖掘等领域已有应用,多尺度数据挖掘在关联规则、聚类、分类挖掘领域也有相关研究与应用,但对如何对数据集进行普适性的多尺度划分以及如何构建多尺度数据集仍未展开研究,已有相关研究缺乏深度。文中从多尺度数据挖掘任务入手,定义了尺度概念,并给出了多尺度化数据集模型,以及基准尺度评分模型;依据概率密度估计的离散化方法提出了多尺度划分算法,扩展了可划分尺度的数据类型,划分结果更贴近数据的多尺度特性,且具有较低的时间复杂度;提出了多尺度化数据集方法、构建多尺度数据集算法和基准尺度选择算法,将多尺度熵与信息熵作为评价方法,在扩充多尺度化数据集方法的基础上,有效减弱了多尺度数据挖掘中因尺度推衍而产生的尺度效应,算法的时间复杂性也较为可控。利用H省真实人口数据集、UCI公用数据集和T10I4D100K数据集对所提算法和模型进行验证与实验分析,结果表明多尺度划分算法和多尺度化数据集方法是可行的,提出的多尺度化数据集方法和基准尺度评分模型是有效的,多尺度划分方法、构建多尺度数据集方法和基准尺度选择方法的应用平均提高了尺度推衍过程中1.6%的覆盖率、2.1%的F1-measure和3.7%的正确率,且具有较低的平均支持度误差。Multi-scale mining has been applied in the fields of graphic images,geographic information,signal analysis,data mining,etc,and also has related research and application in the fields of association rules,clustering and classification mining.Nevertheless how to divide datasets into common scales and how to construct multi-scale datasets have not been studied in depth.Starting with the task of multi-scale data mining,this paper defined the concept of scale and gave a multi-scale dataset model and a benchmark scale scoring model.This paper proposed a multi-scale partition algorithm based on the discretization method of probability density estimation,which extends the data types of divisible scales,and its partition results are closer to the multi-scale characteristics of data with lower time complexity.This paper also proposed a multi-scale dataset method,a multi-scale data set algorithm and a benchmark scale selection algorithm.Multi-scale entropy and information entropy were used as evaluation methods.On the basis of expanding the multi-scale dataset method,the scale effect produced by the meso-scale derivation of multi-scale data mining can be effectively reduced,and the time complexity can be controlled.The proposed algorithm and model were validated and analyzed by using the real population dataset of H province,UCI common dataset and IBM dataset.The experimental results show that the proposed method is feasible and the proposed model is effective.The application of the proposed methods improves coverage by 1.6%, F1-measure by 2.1% and accuracy by 3.7% in scale deduction process,and has low average support error.

关 键 词:多尺度数据挖掘 多尺度划分 离散化 构建多尺度数据集 基准尺度选择 多尺度熵 信息熵 

分 类 号:TP391[自动化与计算机技术—计算机应用技术]

 

参考文献:

正在载入数据...

 

二级参考文献:

正在载入数据...

 

耦合文献:

正在载入数据...

 

引证文献:

正在载入数据...

 

二级引证文献:

正在载入数据...

 

同被引文献:

正在载入数据...

 

相关期刊文献:

正在载入数据...

相关的主题
相关的作者对象
相关的机构对象